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FOREWORD BY ANDREW WILES 


I had the great good fortune to have a high school mathematics teacher who 
had studied number theory. At his suggestion I acquired a copy of the fourth 
edition of Hardy and Wright’s marvellous book An Introduction to the The- 
ory of Numbers. This, together with Davenport’s The Higher Arithmetic, 
became my favourite introductory books in the subject. Scouring the pages 
of the text for clues about the Fermat problem (I was already obsessed) I 
learned for the first time about the real breadth of number theory. Only four 
of the chapters in the middle of the book were about quadratic fields and 
Diophantine equations, and much of the rest of the material was new to 
me; Diophantine geometry, round numbers, Dirichlet’s theorem, continued 
fractions, quaternions, reciprocity . . . The list went on and on. 

The book became a starting point for ventures into the different branches 
of the subject. For me the first quest was to find out more about alge- 
braic number theory and Kummer’s theory in particular. The more analytic 
parts did not have the same attraction then and did not really catch my 
imagination until I had learned some complex analysis. Only then could I 
appreciate the power of the zeta function. However, the book was always 
there as a starting point which I could return to whenever I was intrigued 
by a new piece of theory, sometimes many years later. Part of the success 
of the book lay in its extensive notes and references which gave naviga- 
tional hints for the inexperienced mathematician. This part of the book 
has been updated and extended by Roger Heath-Brown so that a 21st- 
century-student can profit from more recent discoveries and texts. This is 
in the style of his wonderful commentary on Titchmarsh’s The Theory of 
the Riemann Zeta Function. It will be an invaluable aid to the new reader 
but it will also be a great pleasure to those who have read the book in 
their youth, a bit like hearing the life stories of one’s erstwhile school 
friends. 

A final chapter has been added giving an account of the theory of ellip- 
tic curves. Although this theory is not described in the original editions 
(except for a brief reference in the notes to §13.6) it has proved to be crit- 
ical in the study of Diophantine equations and of the Fermat equation in 
particular. Through the Birch and Swinnerton-Dyer conjecture on the one 
hand and through the extraordinary link with the Fermat equation on the 
other it has become a central part of the number theorist’s life. It even 
played a central role in the effective resolution of a famous class number 
problem of Gauss. All this would have seemed absurdly improbable when 
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the book was written. It is thus an appropriate ending for the new edition 
to have a lucid exposition of this theory by Joe Silverman. Of course it is 
only a quick sketch of the theory and the reader will surely be tempted to 
devote many hours, if not the best part of a lifetime, to unravelling its many 
mysteries. 


January, 2008 


A.J.W. 



PREFACE TO THE SIXTH EDITION 

This sixth edition contains a considerable expansion of the end-of-chapter 
notes. There have been many exciting developments since these were last 
revised, which are now described in the notes. It is hoped that these will 
provide an avenue leading the interested reader towards current research 
areas. The notes for some chapters were written with the generous help of 
other authorities. Professor D. Masser updated the material on Chapters 
4 and 11, while Professor G.E. Andrews did the same for Chapter 19. A 
substantial amount of new material was added to the notes for Chapter 21 
by Professor T.D. Wooley, and a similar review of the notes for Chapter 24 
was undertaken by Professor R. Hans-Gill. We are naturally very grateful 
to all of them for their assistance. 

In addition, we have added a substantial new chapter, dealing with ellip- 
tic curves. This subject, which was not mentioned in earlier editions, has 
come to be such a central topic in the theory of numbers that it was felt 
to deserve a full treatment. The material is naturally connected with the 
original chapter on Diophantine Equations. 

Finally, we have corrected a significant number of misprints in the 
fifth edition. A large number of correspondents reported typographical or 
mathematical errors, and we thank everyone who contributed in this way. 

The proposal to produce this new edition originally came from Professors 
John Maitland Wright and John Coates. We are very grateful for their 
enthusiastic support. 

D.R.H.-B. 

J.H.S. 

September, 2007 


D. R. Heath-Brown 

#gim98i^ftu99 
(Berwick Prize) . 

J. H. Silverman 19824£«&f# 

MfiThe Arithmetic of Elliptic Curves 



PREFACE TO THE FIFTH EDITION 


The main changes in this edition are in the Notes at the end of each chapter. 
I have sought to provide up-to-date references for the reader who wishes 
to pursue a particular topic further and to present, both in the Notes and in 
the text, a reasonably accurate account of the present state of knowledge. 
For this I have been dependent on the relevant sections of those invaluable 
publications, the Zentralblatt and the Mathematical Reviews. But I was 
also greatly helped by several correspondents who suggested amendments 
or answered queries. I am especially grateful to Professors J. W. S. Cassels 
and H. Halberstam, each of whom supplied me at my request with a long 
and most valuable list of suggestions and references. 

There is a new, more transparent proof of Theorem 445 and an account of 
my changed opinion about Theodorus’ method in irrationals. To facilitate 
the use of this edition for reference purposes, I have, so far as possible, kept 
the page numbers unchanged. For this reason, I have added a short appendix 
on recent progress in some aspects of the theory of prime numbers, rather 
than insert the material in the appropriate places in the text. 

E. M. W. 

Aberdeen 
October 1978 



PREFACE TO THE FIRST EDITION 


This book has developed gradually from lectures delivered in a number 
of universities during the last ten years, and, like many books which have 
grown out of lectures, it has no very definite plan. 

It is not in any sense (as an expert can see by reading the table of contents) 
a systematic treatise on the theory of numbers. It does not even contain a 
fully reasoned account of any one side of that many-sided theory, but is 
an introduction, or a series of introductions, to almost all of these sides 
in turn. We say something about each of a number of subjects which are 
not usually combined in a single volume, and about some which are not 
always regarded as forming part of the theory of numbers at all. Thus chs. 
XII-XV belong to the ‘algebraic’ theory of numbers, Chs. XIX-XXI to 
the ‘addictive’, and Ch. XXII to the ‘analytic’ theories; while Chs. Ill, XI, 
XXIII, and XXIV deal with matters usually classified under the headings 
of ‘geometry of numbers’ or ‘Diophantine approximation’. There is plenty 
of variety in our programme, but very little depth; it is impossible, in 400 
pages, to treat any of these many topics at all profoundly. 

There are large gaps in the book which will be noticed at once by any 
expert. The most conspicuous is the omission of any account of the theory of 
quadratic forms. This theory has been developed more systematically than 
any other part of the theory of numbers, and there are good discussions of 
it in easily accessible books. We had to omit something, and this seemed to 
us the part of the theory where we had the least to add to existing accounts. 

We have often allowed out personal interests to decide out programme, 
and have selected subjects less because of their importance (though most 
of them are important enough) than because we found them congenial and 
because other writers have left us something to say. Our first aim has been 
to write an interesting book, and one unlike other books. We may have 
succeeded at the price of too much eccentricity, or we may have failed; but 
we can hardly have failed completely, the subject-matter being so attractive 
that only extravagant incompetence could make it dull. 

The book is written for mathematicians, but it does not demand any great 
mathematical knowledge or technique. In the first eighteen chapters we 
assume nothing that is not commonly taught in schools, and any intelligent 
university student should find them comparatively easy reading. The last 
six are more difficult, and in them we presuppose a little more, but nothing 
beyond the content of the simpler university courses. 

The title is the same as that of a very well-known book by Professor 
L. E. Dickson (with which ours has little in common). We proposed at one 
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time to change it to An introduction to arithmetic , a more novel and in some 
ways a more appropriate title; but it was pointed out that this might lead to 
misunderstandings about the content of the book. 

A number of friends have helped us in the preparation of the book. Dr. H. 
Heilbronn has read all of it both in manuscript and in print, and his criticisms 
and suggestions have led to many very substantial improvements, the most 
important of which are acknowledged in the text. Dr. H. S. A. Potter and 
Dr. S. Wylie have read the proofs and helped us to remove many errors and 
obscurities. They have also checked most of the references to the literature 
in the notes at the ends of the chapters. Dr. H. Davenport and Dr. R. Rado 
have also read parts of the book, and in particular the last chapter, which, 
after their suggestions and Dr. Heilbronn’s, bears very little resemblance 
to the original draft. 

We have borrowed freely from the other books which are catalogued 
on pp. 417-19 [pp. 596-9 in current 6th edn.], and especially from those 
of Landau and Perron. To Landau in particular we, in common with all 
serious students of the theory of numbers, owe a debt which we could 
hardly overstate. 

G. H. H. 

E. M. W. 

Oxford 
August 1938 



REMARKS ON NOTATION 


We borrow four symbols from formal logic, viz. 

=, 3, G . 

is to be read as ‘implies’. Thus 

/ | m -* / 1 n (p. 2) 

means ‘ “/ is a divisor of m” implies “/ is a divisor of n” ’, or, what is the 
same thing, ‘if / divides m then / divides and 

b \a.c\ b -> c\a (p. 1) 

means ‘if b divides a and c divides b then c divides a\ 

= is to be read ‘is equivalent to’. Thus 

m | ka — ka' = m\ \ a — a! (p. 61) 

means that the assertions t m divides ka — ka'' and ‘m i divides a — a'' are 
equivalent; either implies the other. 

These two symbols must be distinguished carefully from — ► (tends to) 
and = (is congruent to). There can hardly be any misunderstanding, since 
— ► and = are always relations between propositions. 

3 is to be read as ‘there is an’. Thus 

3/.1 < / < m.l\m (p. 2) 

means ‘there is an / such that (i) 1 < / < m and (ii) / divides m' . 

€ is the relation of a member of a class to the class. Thus 

meS.n€S^(m±n)eS (p. 23) 

means ‘if m and n are members of S then m + n and m — n are members 
of S'. 

A star affixed to the number of a theorem (e.g. Theorem 1 5*) means that 
the proof of the theorem is too difficult to be included in the book. It is not 
affixed to theorems which are not proved but may be proved by arguments 
similar to those used in the text. 
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1.1. Divisibility of integers. The numbers 

...,- 3 ,- 2 , - 1 , 0 , 1 , 2 ,... 

are called the rational integers, or simply the integers-, the numbers 

0, 1,2,3, ... 

the non-negative integers', and the numbers 

1,2,3,... 

the positive integers. The positive integers form the primary subject-matter 
of arithmetic, but it is often essential to regard them as a subclass of the 
integers or of some larger class of numbers. 

In what follows the letters 

a, b,...,n,p,...,x,y,... 

will usually denote integers, which will sometimes, but not always, be 
subject to further restrictions, such as to be positive or non-negative. We 
shall often use the word ‘number’ as meaning ‘integer’ (or ‘positive int- 
eger’, etc.), when it is clear from the context that we are considering only 
numbers of this particular class. 

An integer a is said to be divisible by another integer b, not 0, if there is 
a third integer c such that 


a = be. 

If a and b are positive, c is necessarily positive. We express the fact that a 
is divisible by b, or b is a divisor of a, by 

b\a. 


Thus 


1| a, a\a; 


and b|0 for every b but 0. We shall also sometimes use 
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to express the contrary of b\a. It is plain that 

b\a . c\b — ► c\a, 
b\a -* bc\ac 


if c ^ 0, and 


c\a . c\b -*■ c\ma + nb 


for all integral m and n. 

1.2. Prime numbers. In this section and until § 2.9 the numbers con- 
sidered are generally positive integers.* Among the positive integers there 
is a sub-class of peculiar importance, the class of primes. A number p is 
said to be prime if 

(0 P > 1> 

(ii) p has no positive divisors except 1 and p. 

For example, 37 is a prime. It is important to observe that 1 is not reckoned 
as a prime. In this and the next chapter we reserve the letter p for primes.* 
A number greater than 1 and not prime is called composite. 

Our first theorem is 

Theorem 1. Every positive integer, except 1, is a product of primes. 

Either n is prime, when there is nothing to prove, or n has divisors 
between 1 and n. If m is the least of these divisors, m is prime; for otherwise 

3/ . 1 < / < m. l\m; 


and 


l\m — > l\n, 

which contradicts the definition of m. 

Hence n is prime or divisible by a prime less than n, say p\, in which 
case 


n = p\n\, 1 < n\ < n. 

t There are occasional exceptions, as in §§ 1.7, where e* is the exponential function of analysis, 
t It would be inconvenient to have to observe this convention rigidly throughout the book, and 
we often depart from it. In Ch. IX, for example, we use p/q for a typical rational fraction, and p is 
not usually prime. Butp is the ‘natural’ letter for a prime, and we give it preference when we can 
conveniently. 
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Here either n\ is prime, in which case the proof is completed, or it is 
divisible by a prime p 2 less than n\, in which case 

n — p\n\ = pip 2 n 2 , 1 < «2 < n\ < n. 

Repeating the argument, we obtain a sequence of decreasing numbers 
n, n\ Wit-i,. • •. all greater than 1, for each of which the same alter- 
native presents itself. Sooner or later we must accept the first alternative, 
that /ijt-i is a prime, say pk, and then 

(1.2.1) n-p\p 2 ...pk- 
Thus 

666 = 2.3.3.37. 

If ab = n, then a and b cannot both exceed -Jn. Hence any composite n is 
divisible by a prime p which does not exceed yfh. 

The primes in (1.2.1) are not necessarily distinct, nor arranged in any 
particular order. If we arrange them in increasing order, associate sets of 
equal primes into single factors, and change the notation appropriately, we 
obtain 

p 

(1.2.2) n =p < \ x p < 2 •••/>** (ui > 0,a2 > 0, — , /?i <p 2 < ...). 

We then say that n is expressed in standard form. 

1.3. Statement of the fundamental theorem of arithmetic. There is 
nothing in the proof of Theorem 1 to show that (1 .2.2) is a unique expression 
of n, or, what is the same thing, that (1.2.1) is unique except for possible 
rearrangement of the factors; but consideration of special cases at once 
suggests that this is true. 

Theorem 2 (The fundamental theorem of arithmetic). The standard 
form of n is unique; apart from rearrangement of factors, n can be expressed 
as a product of primes in one way only. 

Theorem 2 is the foundation of systematic arithmetic, but we shall not 
use it in this chapter, and defer the proof to § 2. 10. It is however convenient 
to prove at once that it is a corollary of the simpler theorem which follows. 

Theorem 3 (Euclid’s first theorem). Ifp is prime, and p \ ab, then p | a 
orp | b. 
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We take this theorem for granted for the moment and deduce Theorem 2. 
The proof of Theorem 2 is then reduced to that of Theorem 3, which is given 
in § 2.10. 

It is an obvious corollary of Theorem 3 that 


p\abc .../—► p\a or p\b or p\c ... or p\l. 


and in particular that, if a, b , . . . , / are primes, then p is one of a, b /. 

Suppose now that 


n=P\P a i 


n a k _ n b \ b 2 

•Pk — 4\ 4i 



each product being a product ofprimes in standard form. Thenp, |^i 1 . . . qj 
for every /, so that every pisaq; and similarly every q is a p. Hence k—j 
and, since both sets are arranged in increasing order, pi = qi for every /. 
If a, > bu and we divide by pf l , we obtain 


pT 


■Pi 


fli-bt 


ai b\ 

Pk =pi 


bi-\ bi-L j bk 

■Pi-iPi+i • • p^ 


The left-hand side is divisible by pi, while the right-hand side is not; a 
contradiction. Similarly 6, > a, yields a contradiction. It follows that 
a, = bu and this completes the proof of Theorem 2. 

It will now be obvious why 1 should not be counted as a prime. If it 
were. Theorem 2 would be false, since we could insert any number of unit 
factors. 


1.4. The sequence of primes. The first primes are 


2, 3, 5, 7, 1 1, 13, 17, 19, 23, 29, 3 1, 37, 41, 43,47, 53, ... . 


It is easy to construct a table of primes, up to a moderate limit N, by 
a procedure known as the ‘sieve of Eratosthenes’. We have seen that if 
n N, and n is not prime, then n must be divisible by a prime not greater 
than VN. We now write down the numbers 


2 , 3 , 4 , 5 , 6 , 


,N 


and strike out successively 

(i) 4, 6, 8, 10, . . ., i.e. 2 2 and then every even number, 

(ii) 9, 15,21,27 i.e. 3 2 and then every multiple of 3 not yet struck 

out. 
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(iii) 25, 35, 55, 65, . . ., i.e. 5 2 , the square of the next remaining number 
after 3, and then every multiple of 5 not yet struck out, — 

We continue the process until the next remaining number, after that whose 
multiples were cancelled last, is greater than *JN . The numbers which 
remain are primes. All the present tables of primes have been constructed 
by modifications of this procedure. 

The tables indicate that the series of primes is infinite. They are complete 
up to 1 00,000,000; the total number of primes below 1 0 million is 664,579; 
and the number between 9,900,000 and 10,000,000 is 6,134. The total 
number of primes below 1,000,000,000 is 50,847,478; these primes are 
not known individually. A number of very large primes, mostly of the form 
2^—1 (see §2.5), are also known; the largest found so far has just over 
6,500 digits.^ 

These data suggest the theorem 

Theorem 4 (Euclid’s second theorem). The number of primes is inf- 
inite. 

We shall prove this in § 2. 1 . 

The ‘average’ distribution of the primes is very regular; its density shows 
a steady but slow decrease. The numbers of primes in the first five blocks 
of 1,000 numbers are 


168,135,127,120,119, 

and those in the last five blocks of 1,000 below 10,000,000 are 

62,58,67,64,53. 

The last 53 primes are divided into sets of 

5, 4, 7, 4, 6, 3, 6, 4, 5, 9 

in the ten hundreds of the thousand. 

On the other hand the distribution of the primes in detail is extremely 
irregular. 

In the first place, the tables show at intervals long blocks of composite 
numbers. Thus the prime 370,261 is followed by 1 1 1 composite numbers. 
It is easy to see that these long blocks must occur. Suppose that 

2, 3,5,... ,p 


t See the end of chapter notes. 
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are the primes up to p. Then all numbers up to p are divisible by one of 
these primes, and therefore, if 

2.3.5 . . .p — q. 


all of the p — 1 numbers 

q + 2,q + 3,q + 4, ...,q+p 

are composite. If Theorem 4 is true, then p can be as large as we please; 
and otherwise all numbers from some point on are composite. 

Theorem 5. There are blocks of consecutive composite numbers whose 
length exceeds any given number N. 

On the other hand, the tables indicate the indefinite persistence of prime- 
pairs, such as 3, 5 or 101, 103, differing by 2. There are 1,224 such pairs 
(p,p + 2) below 1 00,000, and 8, 1 69 below 1 ,000,000. The evidence, when 
examined in detail, appears to justify the conjecture 

There are infinitely many prime-pairs (p,p + 2). 

It is indeed reasonable to conjecture more. The numbers p,p + 2,p + 4 
cannot all be prime, since one of them must be divisible by 3; but there 
is no obvious reason why p,p + 2,p + 6 should not all be prime, and the 
evidence indicates that such prime-triplets also persist indefinitely. Sim- 
ilarly, it appears that triplets (p,p 4- 4, p 4- 6) persist indefinitely. We are 
therefore led to the conjecture 

There are infinitely many prime-triplets of the types (p,p +2,p + 6) and 
(P>P + 4, p + 6). 

Such conjectures, with larger sets of primes, may be multiplied, but their 
proof or disproof is at present beyond the resources of mathematics. 

1.5. Some questions concerning primes. What are the natural ques- 
tions to ask about a sequence of numbers such as the primes? We have 
suggested some already, and we now ask some more. 

(1) Is there a simple general formula for the n-th prime p„^ (a formula, 
that is to say, by which we can calculate the value of p„ for any given n with 
less labour than by the use of the sieve of Eratosthenes)? No such formula 
is known and it is unlikely that such a formula is possible. 


t See the end of chapter notes. 
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On the other hand, it is possible to devise a number of ‘formulae’ for 
p„. Of these, some are no more than curiosities since they define p„ in terms 
of itself, and no previously unknown p n can be calculated from them. We 
give an example in Theorem 419. Others would in theory enable us to 
calculate p„, but only at the cost of substantially more labour than does the 
sieve of Eratosthenes. Others still are essentially equivalent to that sieve. 
We return to these questions in § 2.7 and in §§ 1, 2 of the Appendix. 

Similar remarks apply to another question of the same kind, viz. 

(2) is there a simple general formula for the prime which follows a given 
prime (i.e. a recurrence formula such as p n +\ =p% + 2)? 

Another natural question is 

(3) is there a rule by which, given any prime p, we can find a larger 
prime q? 

This question of course presupposes that, as stated in Theorem 4, the 
number of primes is infinite. It would be answered in the affirmative if 
any simple function / (n) were known which assumed prime values for 
all integral values of n. Apart from trivial curiosities of the kind already 
mentioned, no such function is known. The only plausible conjecture con- 
cerning the form of such a function was made by Fermat,* and Fermat’s 
conjecture was false. 

Our next question is 

(4) how many primes are there less than a given number x? 

This question is a much more profitable one, but it requires careful 
interpretation. Suppose that, as is usual, we define 7 t(jc) to be the number 
of primes which do not exceed x, so that 7r(l) = 0, 7 t(2) = 1, 7 r (20) = 8. 
If p„ is the nth prime then 7 t(p n ) = n, so that tt(jc), as function of x, and 
p n , as function of n, are inverse functions. To ask for an exact formula for 
7t(x), of any simple type, is therefore practically to repeat question (1). 

We must therefore interpret the question differently, and ask ‘ about how 
many primes .. .?’ Are most numbers primes, or only a small proportion? 
Is there any simple function f ( x ) which is ‘a good measure’ of j r(x)? 

We answer these questions in § 1 .8 and Ch. XXII. 


1.6. Some notations. We shall often use the symbols 


( 1 . 6 . 1 ) 


O, o. 


+ See §2.5. 



8 


THE SERIES OF PRIMES 


[Chap. I 


and occasionally 

( 1 . 6 . 2 ) ■<) ^ • 

These symbols are defined as follows. 

Suppose that n is an integral variable which tends to infinity, and x a 
continuous variable which tends to infinity or to zero or to some other 
limiting value; that <f>(n) or 0(x) is a positive function of n or x; and that 
/ (/i) or / (x) is any other function of n or x. Then 

(i) / — 0(0) means that* |/| < A<f>, 

where A is independent of n or x, for all values of n or x in question; 

(ii) / = o(<f>) means that //0 -*■ 0; 
and 

(iii) / ~ 0 means that //0 — ► 1. 

Thus 


lOx — 0(x), sinx = 0(1), x = 0(x 2 ), 
x — o(x 2 ), sinx = o(x), x + 1 ~ x, 

where x -► oo, and 

x 2 = 0(x), x 2 — o(x), sinx ~ x, 1 -b x ~ 1, 

when x — ► 0. It is to be observed that f = o{<f>) implies, and is stronger 
than,/ = 0(0). 

As regards the symbols (1.6.2), 

(iv) f < 4> means f /<f> 0, and is equivalent to/ = o(0); 

(v) / > <t> means f /<t> -*■ oo; 

(vi) / X 0 means A<f> < / < A<p, 

where the two A's (which are naturally not the same) are both positive and 
independent of n or x. Thus / X 0 asserts that */ is of the same order of 
magnitude as 0’. 

We shall very often use A as in (vi), viz. as an unspecified positive 
constant. Different A ’s have usually different values, even when they occur 
in the same formula; and, even when definite values can be assigned to 
them, these values are irrelevant to the argument. 


t |/| denotes, as usually in analysis, the modulus or absolute value of f. 
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So far we have defined (for example) ‘/ = 0(1)’, but not ‘0(1)’ in 
isolation; and it is convenient to make our notations more elastic. We agree 
that ‘0(0)’ denotes an unspecified f such that f = 0(0). We can then 
write, for example, 


0(1) 4- 0(1) = 0(1) = o(x) 

when x — ► oo, meaning by this ‘if f = 0(1) and g = 0(1) then / + g = 
0(1) and a fortiori f + g = o{x)\ Or again we may write 

n 

J20(D = 0(n), 

V=1 

meaning by this that the sum of n terms, each numerically less than a 
constant, is numerically less than a constant multiple of n. 

It is to be observed that the relation ‘=’, asserted between O or o symbols, 
is not usually symmetrical. Thus o(l) = 0(1) is always true; but 0(1) = 
o(l) is usually false. We may also observe that / ~ 0 is equivalent to 
/ = 0 4- o(0) or to 


/ = 0(1+ 0(1)). 

In these circumstances we say that f and 0 are asymptotically equivalent, 
or that / is asymptotic to 0. 

There is another phrase which it is convenient to define here. Suppose 
that P is a possible property of a positive integer, and P(x) the number of 
numbers less than x which possess the property P. If 

P{x) ~ x, 

when x —*■ oo, i.e. if the number of numbers less than x which do not 
possess the property is o(x), then we say that almost all numbers possess 
the property. Thus we shall see t that tt(x) = o(x), so that almost all 
numbers are composite. 

1.7. The logarithmic function. The theory of the distribution of primes 
demands a knowledge of the properties of the logarithmic function log x. 
We take the ordinary analytic theoiy of logarithms and exponentials for 
granted, but it is important to lay stress on one property of log jc.J 

t This follows at once from Theorem 7. 

* log x is, of course, the ‘Napierian’ logarithm of x, to base e. ‘Common’ logarithms have no 
mathematical interest. 
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Since 


r , x" x n+l 

^ =1+x+ . 


jc"V > 


(n + 1)! 


oo 


when x ->• oo. Hence e* tends to infinity more rapidly than any power of 
x. It follows that log x, the inverse function, tends to infinity more slowly 
than any positive power of x; log x — ► oo, but 


(1.7.1) 



0 , 


or log x = o(x s ), for every positive 5. Similarly, loglog x tends to infinity 
more slowly than any power of log x. 

We may give a numerical illustration of the slowness of the growth of 
log x. Ifx = 10 9 = 1,000,000,000 then 


log or = 20-72 

Since e 3 = 20 08 . . . , log log* is a little greater than 3, and logloglog x a 
little greater than 1. If x = 10 1,oo °, logloglog x is a little greater than 2. In 
spite of this, the ‘order of infinity’ of logloglog x has been made to play a 
part in the theory of primes. 

The function 


x 

log* 

is particularly important in the theory of primes. It tends to infinity more 
slowly than jc but, in virtue of (1.7.1), more rapidly than* 1-5 , i.e. than any 
power of x lower than the first; and it is the simplest function which has 
this property. 

1.8. Statement of the prime number theorem. After this preface we 
can state the theorem which answers question (4) of § 1.5. 

Theorem 6 (The prime number theorem). The number of primes not 
exceeding x is asymptotic to xAog x: 


n(x) 


x 

log*' 
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This theorem is the central theorem in the theory of the distribution of 
primes. We shall give a proof in Ch. XXII. This proof is not easy but, in 
the same chapter, we shall give a much simpler proof of the weaker 


Theorem 7 (Tchebychef’s theorem). The order of magnitude of n (x) is 
x/log x : 



It is interesting to compare Theorem 6 with the evidence of the tables. 
The values of n(x) for x = 10 3 , x = 10 6 , and x = 10 9 are 


168, 78,498, 50,847,534; 

and the values of x/log x, to the nearest integer, are 

145, 72,382, 48,254,942. 


The ratios are 


1 159. . . , 1 084 . . . , 1 053 . . . ; 

and show an approximation, though not a very rapid one, to 1 . The excess of 
the actual over the approximate values can be accounted for by the general 
theory. 

If 

x 

y = \ — 

log* 

then 

logy = log x - log log x, 

and 

log log x = o(logx), 

so that 


logy ~ logx, x — y log* ~ y logy. 

The function inverse to x/log x is therefore asymptotic to x log x. 
From this remark we infer that Theorem 6 is equivalent to 
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Theorem 8: 


p n ~ n log n. 

Similarly, Theorem 7 is equivalent to 

Theorem 9: 


p n X n\ogn. 

The 664,999th prime is 10,006,721; the reader should compare these 
figures with Theorem 8. 

We arrange what we have to say about primes and their distribution 
in three chapters. This introductory chapter contains little but definitions 
and preliminary explanations; we have proved nothing except the easy, 
though important, Theorem 1 . In Ch. II we prove rather more: in particular, 
Euclid’s theorems 3 and 4. The first of these carries with it (as we saw in 
§1.3) the ‘fundamental theorem’ Theorem 2, on which almost all our later 
work depends; and we give two proofs in §§ 2. 1 0-2. 1 1 . We prove Theorem 
4 in §§ 2.1, 2.4, and 2.6, using several methods, some of which enable us 
to develop the theorem a little further. Later, in Ch. XXII, we return to 
the theory of the distribution of primes, and develop it as far as is possible 
by elementary methods, proving, amongst other results* Theorem 7 and 
finally Theorem 6. 


NOTES 

§1.3. Theorem 3 is Euclid vii. 30. Theorem 2 does not seem to have been stated explicitly 
before Gauss (DA., § 16). It was, of course, familiar to earlier mathematicians; but Gauss 
was the first to develop arithmetic as a systematic science. See also § 1 2.5. 

§ 1 .4. The best table of factors is D. N. Lehmer’s Factor table for the first ten millions 
(Carnegie Institution, Washington 105(1 909)) which gives the smallest factor of all numbers 
up to 10,017,000 not divisible by 2, 3, 5, or 7. The same author’s List of prime numbers from 
1 to 10,006,721 (Carnegie Institution, Washington 165 (1914)) has been extended up to 10 8 
by Baker and Gruenberger (The first six million prime numbers, Rand Corp., Microcard 
Found., Madison 1 959). Information about earlier tables will be found in the introduction 
to Lehmer’s two volumes and in Dickson’s History, i, ch. xiii. Our numbers of primes are 
less by 1 than Lehmer’s because he counts 1 as a prime. Mapes (Math. Computation 17 
(1963), 184-5) gives a table of n(x) for jc any multiple of 10 million up to 1,000 million. 

A list of tables of primes with descriptive notes is given in D. H. Lehmer's Guide to tables 
in the theory of numbers (Washington, 1941). Large tables of primes are essentially obso- 
lete now, since computers can generate primes afresh with sufficient rapidity for practical 
purposes. 

Theorem 4 is Euclid ix. 20. 

For Theorem 5 see Lucas, Theorie des nombres, i (1891), 359-61. 
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Kraitchik [Sphinx, 6(1 936), 1 66 and 8 ( 1 938), 86] lists all primes between 1 0 12 — 10 4 and 
10 12 + 10 4 ; and Jones, Lai, and Blundon {Math. Comp. 21 (1967), 103-7) have tabulated 
all primes in the range 10* to 10* + 1 50, 000 for integer k from 8 to 15. The largest known 
pair of primes p,p + 2 is 


20036636 13. 2 1 95000 ± 1, 


found by Vautier in 2007. These primes have 58711 decimal digits. 

In § 22.20 we give a simple argument leading to a conjectural formula for the number 
of pairs (p,p + 2) below x. This agrees well with the known facts. The method can be 
used to find many other conjectural theorems concerning pairs, triplets, and larger blocks 
of primes. 

§ 1.5. Our list of questions is modified from that given by Carmichael, Theory of numbers, 
29. Of course we have not (and cannot) define what we mean by a ‘simple formula’ in this 
context. One could more usefully ask about algorithms for computing the nth prime. Clearly 
there is an algorithm, given by the sieve of Eratosthenes. Thus the interesting question is just 
how fast such an algorithm might be. A method based on the work of Lagarias and Odlyzko 
(J. Algorithms 8 (1987), 173-91) computes p n in time 0(n 3 / 5 ), (or indeed slightly faster 
if large amounts of memory are available). For questions (2) and (3) one might similarly 
ask how fast one can find p n + 1 given p n , or more generally, how rapidly one can find any 
prime greater than a given prime p. At present it appears that the best approach is merely to 
test each number from p n onwards for primality. One would conjecture that this process is 
extremely efficient, in as much as there should be a constant c > 0 such that the next prime 
is found in time 0((log n) c ). We have a very fast test for primality, due to Agrawal, Kayal, 
and Saxena (Ann. of Math. (2) 160 (2004), 781-93), but the best known upper bound on 

the difference p n +\ —p n is only O 525 ) . (See Baker, Harman, and Pintz, Proc. London 
Math. Soc. (3) 83 (2001), 532-62). Thus at present we can only say that p n +\ can be 
determined, given p n , in time O {pty, for any constant 0 > 0.525. 

§ 1.7. Littlewood’s proof that tt(x) is sometimes greater than the ‘logarithm integral’ 
Li(jc) depends upon the largeness of logloglog x for large x. See Ingham, ch. v, or Landau, 
Vorlesungen , ii. 123-56. 

§ 1 .8. Theorem 7 was proved by Tchebychef about 1850, and Theorem 6 by Hadamard 
and de la Valtee Poussin in 1 896. See Ingham, 4-5; Landau, Handbuch, 3-55; and Ch. XXII, 
especially the note to §§ 22.14-16. 

Abetter approximation to n(x) is provided by the ‘logarithmic integral’ 


Li(x) = 


r x dt 
J2 log /* 


Thus at x = 10 9 , for example, n(x) and x/log x differ by more than 2,500,000, while n(x) 
and Li(x) only differ by about 1 ,700. 
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2.1. First proof of Euclid’s second theorem. Euclid’s own proof of 
Theorem 4 was as follows. 

Let 2, 3, 5,...,/? be the aggregate of primes up to p, and let 

(2.1.1) q = 2.3.5... p+ 1. 

Then q is not divisible by any of the numbers 2, 3, 5 ,...,/?. It is therefore 
either prime, or divisible by a prime between /? and q. In either case there 
is a prime greater than p, which proves the theorem. 

The theorem is equivalent to 

(2.1.2) n(x) — ► ex). 

2.2. Further deductions from Euclid’s argument. If p is the nth prime 
p n , and q is defined as in (2. 1 . 1 ), it is plain that 

? < Pn + 1 


for n > l,t and so that 


Pn + 1 <P"+ 1. 

This inequality enables us to assign an upper limit to the rate of increase 
of p n , and a lower limit to that of 7 t(jc). 

We can, however, obtain better limits as follows. Suppose that 

(2.2.1) Pn < 2 2 " 

for n = l, 2,..., N. Then Euclid’s argument shows that 

(2.2.2) Pn+i < p\pi . . Pn + 1 < 2 2+4+ " +2 * + 1 < 2 2 * +l . 

Since (2.2.1) is true for n = 1, it is true for all n. 

t There is equality when 


n = 1, p = 2, <7 = 3. 
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Suppose now that n > 4 and 


c «- 1 . e " 
e e < x ^ e e . 


15 


Thent 

e n ~ l > 2", e*" -1 > 2 2 "; 

and so 

n(x) ^ ^(e*" *) > 7r(2 2 ") ^ n, 
by (2.2.1). Since loglog x ^ n, we deduce that 

7r(x) ^ loglog x 

3 3 

for x > e e ; and it is plain that the inequality holds also for 2 < x ^ . 

We have therefore proved 

Theorem 10: 


7T (x) ^ loglog X (X ^ 2). 

We have thus gone beyond Theorem 4 and found a lower limit for the 
order of magnitude of n(x). The limit is of course an absurdly weak one, 
since for x = 10 9 it gives n(x) ^3, and the actual value of n(x) is over 50 
million. 

2.3. Primes in certain arithmetical progressions. Euclid’s argument 
may be developed in other directions. 

Theorem 1 1 . There are infinitely many primes of the form An + 3. 

Define q by 

q = 2 2 .3.5. . .p — 1, 

instead of by (2.1.1). Then q is of the form 4/i+3, and is not divisible by 
any of the primes up to p. It cannot be a product of primes 4 n + 1 only, since 
the product of two numbers of this form is of the same form; and therefore 
it is divisible by a prime 4n+3, greater than p. 

Theorem 12. There are infinitely many primes of the form 6n + 5. 


t This is not true for n = 3. 



16 


THE SERIES OF PRIMES 


[Chap. II 


The proof is similar. We define q by 

q = 2.3.5 ...p— 1, 

and observe that any prime number, except 2 or 3, is 6/i+l or 6/H-5, and 
that the product of two numbers 6 / 1+1 is of the same form. 

The progression 4/j+l is more difficult. We must assume the truth of a 
theorem which we shall prove later (§ 20.3). 

Theorem 13. If a and b have no common factor, then any odd prime 
divisor of a 2 + b 2 is of the form 4/i + 1. 

If we take this for granted, we can prove that there are infinitely many 
primes 4/t+l . In fact we can prove 

Theorem 14. There are infinitely many primes of the form 8/H-5. 

We take 


q = 3 2 .5 2 .7 2 . . .p 2 4- 2 2 , 

a sum of two squares which have no common factor. The square of an odd 
number 2/w+l is 


4 m(m + 1) + 1 

and is 8/1+1, so that q is 8/1+5. Observing that, by Theorem 13, any prime 
factor of q is 4/1+1, and so 8/t+l or 8/1+5, and that the product of two 
numbers 8/j+l is of the same form, we can complete the proof as before. 

All these theorems are particular cases of a famous theorem of Dirichlet. 

Theorem 15* (Dirichlet’s theorem), t If a is positive and a and b have 
no common divisor except 1, then there are infinitely many primes of the 
form an+b. 

The proof of this theorem is too difficult for insertion in this book. There 
are simpler proofs when b is 1 or — 1 . 

' An asterisk attached to the number of a theorem indicates that it is not proved anywhere in the 
book. 
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2.4. Second proof of Euclid's theorem. Our second proof of Theorem 
4, which is due to Polya, depends upon a property of what are called 
‘Fermat’s numbers’. 

Fermat’s numbers are defined by 



+ 


1 , 


so that 


F\ = 5, F 2 = 17, F 2 = 257, F 4 = 65537. 

They are of great interest in many ways: for example, it was proved by 
Gauss* that, if F n is a prime p, then a regular polygon of p sides can be 
inscribed in a circle by Euclidean methods. 

The property of the Fermat numbers which is relevant here is 

Theorem 16. No two Fermat numbers have a common divisor greater 
than 1. 

For suppose that F n and F n +k, where k > 0, are two Fermat numbers, 
and that 


m\F n , m\F n+ k. 


If x = 2 2 " , we have 


F„ +i - 2 2 2 "*" - 1 

F n 2 2 " + 1 

and so F n \F n+ k — 2. Hence 


2* i 

AT — 1 
X + 1 


= x 2 *-!_ jc 2*-2 + 


- 1 , 


m\F n+k , m\F n + k - 2; 

and therefore m\ 2. Since F n is odd, m = 1, which proves the theorem. 

It follows that each of the numbers F \ , F 2 ,. . ., F„ is divisible by an odd 
prime which does not divide any of the others; and therefore that there are 
at least n odd primes not exceeding F„. This proves Euclid’s theorem. Also 

Pn+ 1 Fn — 2 2 +1, 


and it is plain that this inequality, which is a little stronger than (2.2.1), 
leads to a proof of Theorem 10. 


t See §5.8. 
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2.5. Fermat’s and Mersenne’s numbers. The first four Fermat num- 
bers are prime, and Fermat conjectured that all were prime. Euler, however, 
found in 1 732 that 

F s = 2 2$ + 1 =641.6700417 


is composite. For 

641 = 5 4 + 2 4 = 5.2 7 + 1 

divides each of 5 4 . 2 28 +2 32 and 5 4 .2 28 — 1 and so divides their difference 

f 5 . 

In 1880 Landry proved that 

F 6 = 2 2& + 1 = 274177.67280421310721. 

More recent writers have proved that F„ is composite for 

7^/i^ 16, /i = 18,19,21,23,36,38,39,55,63,73 

and many larger values of n. No factor is known for F\ 4 , but in all the other 
cases proved to be composite a factor is known. 

No prime F„ has been found beyond F4, so that Fermat’s conjecture has 
not proved a very happy one. It is perhaps more probable that the number 
of primes F„ is finite, t If this is so, then the number of primes 2"+l is 
finite, since it is easy to prove 

Theorem 17. If a ^ 2 and a n + 1 is prime, then a is even and n = 2 m . 

For if a is odd then a" -I- 1 is even; and if n has an odd factor k and 
n = kl, then a n + 1 is divisible by 

^+1 =„<*-■>' _„<*-»' + ... + l. 
a 1 + 1 

* This is what is suggested by considerations of probability. Assuming Theorem 7, one might argue 
roughly as follows. The probability that a number n is prime is at most 

A 

log n 

and therefore the total expectation of Fermat primes is at most 

A ^ I log(2 2 " + 1) ) <A ^> 2 <A 

This argument (apart from its general lack of precision) assumes that there are no special reasons why 
a Fermat number should be likely to be prime, while Theorems 1 6 and 17 suggest that there are some. 
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It is interesting to compare the fate of Fermat’s conjecture with that of 
another famous conjecture, concerning primes of the form 2 n — 1 . We begin 
with another trivial theorem of much the same type as Theorem 17. 

Theorem 18. If n > 1 and a n — 1 is prime, then a = 2 and n is prime. 

For if a > 2, then a — 1 \a n — 1 ; and if a = 2 and n = kl, then we have 
2 k - \ \2 n - 1 . 

The problem of the primality of a n — 1 is thus reduced to that of the 
primality of 2 p — 1 . It was asserted by Mersenne in 1 644 that M p = 2 P — \ 
is prime for 


p = 2, 3, 5, 7, 13, 17, 19,31,67, 127,257, 

and composite for the other 44 values of p less than 257. The first mistake in 
Mersenne ’s statement was found about 1 886,* when Pervusin and Seelhoff 
discovered that is prime. Subsequently four further mistakes were 
found in Mersenne ’s statement and it need no longer be taken seriously. 
In 1876 Lucas found a method for testing whether M p is prime and used it 
to prove A /127 prime. This remained the largest known prime until 1951, 
when, using different methods, Ferrier found a larger prime (using only a 
desk calculating machine) and Miller and Wheeler (using the EDSAC 1 
electronic computer at Cambridge) found several large primes, of which 
the largest was 


180^27+1, 

which is larger than Ferrier ’s. But Lucas’s test is particularly suitable for 
use on a binary digital computer and it has subsequently been applied by a 
succession of investigators (Lehmer and Robinson, Hurwitz and Selffidge, 
Riesel, Gillies, Tuckerman and finally Nickel and Noll). As a result it is 
now known that M p is prime for 

p = 2, 3, 5, 7, 1 3, 1 7, 1 9, 3 1 , 6 1 , 89, 1 07, 

1 27, 52 1 , 607, 1 279, 2203, 228 1,3217, 
4253,4423,9689,9941, 11213, 19937,21701, 

and composite for all other p < 21700. The largest known prime is thus 
A /21701 > a number of 6533 digits.* 


* Euler stated in 1732 that A /4 1 and A /47 are prime, but this was a mistake, 
t See the end of chapter notes. 
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We describe Lucas’s test in § 15.5 and give the test used by Miller and 
Wheeler in Theorem 101. 

The problem of Mersenne’s numbers is connected with that of ‘perfect’ 
numbers, which we shall consider in § 16.8. 

We return to this subject in § 6.15 and § 15.5. 

2.6. Third proof of Euclid’s theorem. Suppose that 2, 3 are the 
first j primes and let N(x) be the number of n not exceeding x which are 
not divisible by any prime p > Pj. If we express such an n in the form 

n = n\m, 

where m is ‘squarefree’, i.e. is not divisible by the square of any prime, we 
have 


m = 2 bl 3 b2 ...pj j , 

with every b either 0 or 1 . There are just 2 / possible choices of the exponents 
and so not more than 2- / different values of m. Again, n\ ^ y/n ^ yjx and 
so there are not more than yjx different values of n\ . Hence 

(2.6.1) N(x) ^ 2V*- 


If Theorem 4 is false, so that the number of primes is finite, let the primes 
be 2, 3, . . . ,pj . In this case N{x) = x for every x and so 

x ^ 2/ y/x, x < 2 2 - / , 


which is false for x ^ 2 2j + 1. 

We can use this argument to prove two further results. 

Theorem 19. The series 


( 2 . 6 . 2 ) 


^ 1 1 1 1 1 1 
£; = 2 + 3 + 5 + 7 + TT + 


is divergent. 

If the series is convergent, we can choose j so that the remainder after j 
terms is less than i.e. 


1 

Pj+i 


1 

+ 

Pj+2 


+ ... 


< 


1 

2 ' 
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The number of n ^ jc which are divisible by p is at most x/p. Hence 
x — N(x), the number of n ^ x divisible by one or more of/>y+i,p/+ 2 ,---, 
is not more than 


x 

Pj + 1 


x 

+ 

Pj+2 




< 



Hence, by (2.6.1), 

jx < N(x) ^ 2 / n /x, x < 2 2J+2 , 

which is false for x ^ 2 2 - /+2 . Hence the series diverges. 
Theorem 20: 


n(x) ^ — — - (jc ^ 1); p n ^ 4". 

2 log 2 

We take j = n(x), so that pj+\ > x and N(x) = jc. We have 
jc = N(x) ^ 2 7r(x) . v /jc, 2 n ^ ^ Vjc, 

and the first part of Theorem 20 follows on taking logarithms. If we put 
jc = p n , so that 7T (jc) = n, the second part is immediate. 

By Theorem 20, tt( 10 9 ) ^15; a number, of course, still ridiculously 
below the mark. 

2.7. Further results on formulae for primes. We return for a moment 
to the questions raised in § 1.5. We may ask for ‘a formula for primes’ in 
various senses. 

(i) We may ask for a simple function f (ri) which assumes all prime values 
and only prime values, i.e. which takes successively the values p\,pi, • • • 
when n takes the values 1,2,.... This is the question which we discussed 
in § 1.5. 

(ii) We may ask for a simple function of n which assumes prime values 
only. Fermat’s conjecture, had it been right, would have supplied an answer 
to this question.^ As it is, no satisfactory answer is known. But it is possible 

+ It had been suggested that Fermat’s sequence should be replaced by 
2+1, 2 2 + l, 2 z2 + 1, 2 222 +l,.... 

The first four numbers are prime, but F\&, the fifth member of this sequence, is now known to be 
composite. Another suggestion was that the sequence M p , where p is confined to the Mersenne primes, 
would contain only primes. But M\$ =8191 and is composite. 
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to construct a polynomial (in several positive integral variables) whose 
positive values are all prime and include all the primes, though its negative 
values are composite. See § 2 of the Appendix. 

(iii) We may moderate our demands and ask merely for a simple function 
of n which assumes an infinity of prime values. It follows from Euclid’s 
theorem that f(n) = n is such a function, and less trivial answers are given 
by Theorems 1 1—15. Apart from trivial solutions, Dirichlet’s Theorem 15 
is the only solution known. It has never been proved that n 2 +l, or any 
other quadratic form in n, will represent an infinity of primes, and all such 
problems seem to be extremely difficult. 

There are some simple negative theorems which contain a very partial 
reply to question (ii). 

Theorem 21. No polynomial f(n) with integral coefficients, not a 
constant, can be prime for all n, or for all sufficiently large n. 

We may assume that the leading coefficient in / (n) is positive, so that 
f(n) —> oo when n —*■ oo, and f(n) > 1 for n > N, say. If x > N and 

f{x) - a 0 x k H =y > 1, 


then 


f(ry + x) = a 0 (ry + x) k -4 

is divisible by y for every integral r; and / (ry+x) tends to infinity with r. 
Hence there are infinitely many composite values of f ( n ). 

There are quadratic forms which assume prime values for considerable 
sequences of values of n. Thus n 2 — n + 41 is prime for 0 ^ n ^ 40, and 

n 2 - 79/I+ 1601 = (n - 40) 2 + (n - 40) + 41 

for 0 ^ n ^ 79. 

A more general theorem, which we shall prove in § 6.4, is 
Theorem 22. If 


/(*) = />(*, 2", 3",..., *") 

is a polynomial in its arguments, with integral coefficients, and fin) —*■ oo 
when n — *■ oo,^ then fin) is composite for an infinity of values of n. 

t Some care is required in the statement of the theorem, to avoid such an/(n) as 2”3” — 6" + 5, 
which is plainly prime for all n. 
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2.8. Unsolved problems concerning primes. In § 1.4 we stated two 
conjectural theorems of which no proof is known, although empirical 
evidence makes their truth seem highly probable. There are many other 
conjectural theorems of the same kind. 

There are infinitely many primes n 2 + 1. More generally, if a, b, c are 
integers without a common divisor, a is positive, a+b and c are not both 
even, and b 2 — 4ac is not a perfect square, then there are infinitely many 
primes an 2 +bn+c. 

We have already referred to the form n 2 + 1 in § 2.7 (iii). If a, b, c have 
a common divisor, there can obviously be at most one prime of the form 
required. If a + b and c are both even, then N = an 2 +bn+c is always even. 
If b 2 — 4 ac = k 2 , then 

4aN = (2 an 4- b ) 2 — k 2 . 

Hence, if N is prime, either 2an+b + k or 2 an+b — k divides 4a, and this 
can be true for at most a finite number of values of n. The limitations stated 
in the conjecture are therefore essential. 

There is always a prime between n 2 and (h+1) 2 . 

If n > 4 is even, then n is the sum of two odd primes. 

This is ‘Goldbach’s theorem’. 

If n ^ 9 is odd, then n is the sum of three odd primes. 

Any n from some point onwards is a square or the sum of a prime and a 
square. 

This is not true of all n; thus 34 and 58 are exceptions. 

A more dubious conjecture, to which we referred in § 2.5, is 

The number of Fermat primes F„ is finite. 

2.9. Moduli of integers. We now give the proof of Theorems 3 and 2 
which we postponed from § 1 .3. Another proof will be given in § 2. 1 1 and 
a third in § 12.4. Throughout this section integer means rational integer, 
positive or negative. 

The proof depends upon the notion of a ‘modulus’ of numbers. Amodulus 
is a system S of numbers such that the sum and difference of any two 
members of S are themselves members of S: i.e. 

(2.9.1) m e S .n € S — ► (m±n) e S. 


The numbers of a modulus need not necessarily be integers or even rational; 
they may be complex numbers, or quaternions: but here we are concerned 
only with moduli of integers. 
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The single number 0 forms a modulus (the null modulus). 

It follows from the definition of S that 

aeS-^0 = a-flG S.2a = a + a g S. 

Repeating the argument, we see that na g S for any integral n (positive or 
negative). More generally 

(2.9.2) a€S.b€S-*xa+yb€S 

for any integral x,y. On the other hand, it is obvious that, if a and b are 
given, the aggregate of values of xa+yb forms a modulus. 

It is plain that any modulus 5, except the null modulus, contains some 
positive numbers. Suppose that d is the smallest positive numberof S. If n 
is any positive number of S, then n—zd g S for all z. If c is the remainder 
when n is divided by d and 


n = zd + c, 

then c g S and 0 < c < d. Since d is the smallest positive number of S, 
we have c = 0 and n = zd. Hence 

Theorem 23. Any modulus, other than the null modulus, is the aggregate 
of integral multiples of a positive number d. 

We define the highest common divisor d of two integers a and b, not 
both zero, as the largest positive integer which divides both a and b; and 
write 


d = ( a,b ). 

Thus (0 , a) = \a\. We may define the highest common divisor 

( a,b , c,...,k) 

of any set of positive integers a, b, c, ...,k in the same way. 

The aggregate of numbers of the form 

xa +yb , 

for integral x,y, is a modulus which, by Theorem 23, is the aggregate of 
multiples zc of a certain positive c. Since c divides every number of S, it 
divides a and b, and therefore 

c <£ d. 
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On the other hand, 

d\a.d\b — ► d\xa +yb, 

so that d divides every number of S, and in particular c. It follows that 

c = d 

and that S is the aggregate of multiples of d. 

Theorem 24. The modulus xa 4- yb is the aggregate of multiples of d = 
(a, b). 

It is plain that we have proved incidentally 
Theorem 25. The equation 

ax A- by — n 

is soluble in integers x, y if and only if d\n. In particular, 

ax + by = d 

is soluble. 

Theorem 26. Any common divisor of a and b divides d. 

2.10. Proof of the fundamental theorem of arithmetic. We are now 

in a position to prove Euclid’s theorem 3, and so Theorem 2. 

Suppose that p is prime and/?| ab. lip \ a then (a,p) = 1 , and therefore, 
by Theorem 24, there are an x and ay for which xa+yp = 1 or 

xab + ypb = b. 

But p\ab and p\pb, and therefore p\b. 

Practically the same argument proves 

Theorem 27: 

(a, b) = d . c > 0 — ► (ac, be) = dc. 

For there are an x and a y for which xa 4- yb = d or 

xac 4- ybc = dc. 

Hence (ac, be) \ dc. On the other hand, d\a -*■ dc \ ac and d \b — *■ dc | bc\ 
and therefore, by Theorem 26, dc \ (ac, be). Hence (ac, be) = dc. 
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2.11. Another proof of the fundamental theorem. We call numbers 
which can be factorized into primes in more than one way abnormal. Let 
n be the least abnormal number. The same prime P cannot appear in two 
different factorizations of n, for, if it did, njP would be abnormal and 
n/P < n. We have then 


n — P 1 P 2 P 3 • • * — <7i#2 •••> 


where the p and q are primes, no p is a q and no q is a p. 

We may take p\ to be the least p; since n is composite ,/? \ ^ n. Similarly, 
if q\ is the least q, we have q\ ^ n and, since p\ ^ q\, it follows that 
piqi < n. Hence, if N = n — p\q\, we have 0 < N < n and N is not 
abnormal. Now p \ | n and so p\ \N; similarly q\ \N . Hence p\ and q\ both 
appear in the unique factorization of N and p\q\ | N. From this it follows 
that piq\ \n and hence that q\ \n/p\ . But n/p\ is less than n and so has the 

unique prime factorization pips Since q\ is not a p, this is impossible. 

Hence there cannot be any abnormal numbers and this is the fundamental 
theorem. 


NOTES 

§ 2.2. Mr. Ingham tells us that the argument used here is due to Bohr and Littlewood: 
see Ingham, 2. 

§ 2.3. For Theorems 11, 12, and 14, see Lucas, Theorie des nombres , i (1891), 353-4; 
and for Theorem 15 see Landau, Handbuch , 422-46, and Vorlesungen , i. 79-96. 

An interesting extension of Theorem 15 has been obtained by Shiu (J. London Math . 
Soc. (2) 61 (2000), 359-73). This says that for a and b as in Theorem 15, the sequence 
of primes contains arbitrarily long strings of consecutive elements, all of which are of the 
form an -I- b. Taking a = 1000 and b = 111 for example, this means that one can find as 
many consecutive primes as desired, each of which ends in the digits 777, 

§ 2.4. See Polya and Szeg6, No. 94. 

§ 2.5. See Dickson, History , i, chs. i, xv, xvi, Rouse Ball Mathematical recreations 
and essays , Ch.2, and, for the earlier numerical results, Kraitchik, Theorie des nombres , 
i (Paris, 1922), 22, 218 and D. H. Lehmer, Bulletin Amen Math. Soc. 38 (1932), 383-4. 
Miller and Wheeler ( Nature 168 (1951), 838) give their large prime and Tuckerman (Proc. 
Nat. Acad. Sci. U.S.A. 68 (1971), 2319-20) gives the Mersenne prime M p with p = 19937 
and references to the other smaller ones found by electronic computing. The discovery of 
the prime M p with p = 21701 was reported in the Times of 17th November, 1978. For 
factors of composite F m see Hallyburton and Brillhart, Math. Comp. 29 (1975), 109-12 
and, for a factor of Fg, see Brent, American Math. Soc. Abstracts , 1 (1980), 565. 

By 2007, F n was known to be composite and had been completely factored for the values 
5 ^ n ^ 1 1, while many factors had been discovered for larger n. It was known that F n is 
composite for 4 ^ n ^ 32. The smallest n for which no factor of F n had been discovered 
was n = 14. 
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Similarly, by 2007, a total of 44 Mersenne primes had been discovered, the largest 
being A/32582657- The 39th Mersenne prime had been identified as A/i3466917> but not all 
Mersenne numbers in between these two had been tested. 

Ferrier’s prime is (2 148 + 1 )/l 7 and is the largest prime found without the use of electronic 
computing (and may well remain so). 

The new large computers have made the subjects of factoring large numbers and of 
testing large numbers for primality very interesting and highly non-trivial. Guy {Proc. 5th 
Manitoba Conf. Numerical Math . 1975, 49-89) gives a full account of methods of factoring, 
some remarks about tests for primality and a substantial list of references on both topics. On 
tests for primality, see also, for example, Brillhart, Lehmer, and Selfridge, Math. Comp. 29 
(1975), 620-47 and Selfridge and Wunderlich, Prvc. 4th Manitoba Conf. Numerical Math. 
1974, 109-20. 

Our proof that 641 1 F$ is due to Coxeter {Introduction to geometry, New York, Wiley, 
1969), following Kraitchik and Bennett. 

Ribenboim, The new book of prime number records , (Springer, New York, 1996) gives 
a full account of all the above work, and much besides. 

§ 2.6. See Erd6s, Mathematica, B 7 (1938), 1-2. Theorem 19 was proved by Euler in 
1737. 

§ 2.7. Theorem 2 1 is due to Goldbach ( 1 752) and Theorem 22 to Morgan Ward, Journal 
London Math . Soc. 5 (1930), 106-7. 

§ 2.8. See § 3 of the Appendix. 

§§ 2.9-10. The argument follows the lines of Hecke, ch. i. The definition of a modulus 
is the natural one, but is redundant. It is sufficient to assume that 

meS.neS^m — neS. 


For then 


0 = n — n e S 9 —n = 0 — neS t m + n = m — {—n) e S. 

§ 2.1 1. F. A. Lindemann, Quart. J. of Math. (Oxford), 4 (1933), 319-20, and Davenport, 
Higher arithmetic , 20. For somewhat similar proofs, see Zermelo, Gottinger Nachrichten 
(new series), i (1934), 43-4, and Hasse, Journal fur Math. 159 (1928), 3-6. 



Ill 

FAREY SERIES AND A THEOREM OF MINKOWSKI 


3.1. The definition and simplest properties of a Farey series. In this 
chapter we shall be concerned primarily with certain properties of the ‘pos- 
itive rationals’ or ‘vulgar fractions’, such as j or -jj. Such a fraction may 
be regarded as a relation between two positive integers, and the theorems 
which we prove embody properties of the positive integers. 

The Farey series of order n is the ascending series of irreducible 
fractions between 0 and 1 whose denominators do not exceed n. Thus h/k 
belongs to if 

( 3 . 1 . 1 ) 0 ^ h ^ k ^ n, (h,k) = 1 ; 

the numbers 0 and 1 are included in the forms y and {. For example, 85 is 

0 1 1 1 2 1 3 2 3 4 1 
T ’5’4’3’5’2’5’3’4’5’ f 

The characteristic properties of Farey series are expressed by the following 
theorems. 

Theorem 28. If h/k and h'/k ' are two successive terms of then 

(3.1.2) kh'-hk' = 1. 


Theorem 29. If h/k, h"/k", and h'/k' are three successive terms of 
then 


(3.1.3) 


h" _ h + h! 
k" ~ ~k + k r 


We shall prove that the two theorems are equivalent in the next section, 
and then give three different proofs of both of them, in §§ 3.3, 3.4, and 
3.7 respectively. We conclude this section by proving two still simpler 
properties of 3 „. 

Theorem 30. If h/k and h'/k' are two successive terms of S„, then 
(3.1.4) k + k'>n. 


The ‘mediant’ 

h + h' f 

k + k' 


t Or the reduced form of this fraction. 
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of h/k and h'/k' falls in the interval 



Hence, unless (3.1.4) is true, there is another term of between h/k and 

h'/k'. 

Theorem 3 1 . Ifn > 1, then no two successive terms of have the same 

denominator. 

If k > 1 and h'/k succeeds h/k in S n , then h+ 1 ^ h! < k. But then 

h h h+l h' 
k k - 1 k ^ k’ 

and h/(k — l)t comes between h/k and h'/k in a contradiction. 

3.2. The equivalence of the two characteristic properties. We now 
prove that each of Theorems 28 and 29 implies the other. 

(1) Theorem 28 implies Theorem 29. If we assume Theorem 28, and 
solve the equations 

(3.2. 1) kh" - hk" = 1 , k"h' - h"k' = 1 

for h" and k", we obtain 

h"(kh! -hk') = h + h! , k"(kh' - hk') = k + k', 
and so (3.1.3). 

(2) Theorem 29 implies Theorem 28. We assume that Theorem 29 is true 
generally and that Theorem 28 is true for 3„_i, and deduce that Theorem 
28 is true for 3 rt . It is plainly sufficient to prove that the equations (3.2.1) 
are satisfied when h"/k" belongs to 2f„ but not to so that k" = n. 
In this case, after Theorem 3 1 , both k and k' are less than k", and h/k and 
h'/k' are consecutive terms in S„_i . 

Since (3.1.3) is true ex hypothesi, and h"/k" is irreducible, we have 

h + h' = kh", k + k! = kk", 

where k is an integer. Since k and k' are both less than k", k must be 1. 


T Or the reduced form of this fraction. 
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Hence 


h" = h + h', k" = k + k! , 
kh" - hk" = kh ' -hk' = 1; 


and similarly 


k"h' - h"k' = 1. 

3.3. First proof of Theorems 28 and 29. Our first proof is a natural 
development of the ideas used in § 3.2. 

The theorems are true for n = 1; we assume them true for S*_i and 
prove them true for 

Suppose that h/k and h'/k' are consecutive in 3„_i but separated by 
h"/k" in 3„. + Let 


(3.3.1) kh" - hk" = r > 0, k"h! - h"k' = j > 0. 
Solving these equations for h" and k", and remembering that 

kh' -hk' = 1, 

we obtain 

(3.3.2) h" = sh + rh k " = sk + rk ! . 


Here (r, j) = 1, since ( h",k ") = 1. 
Consider now the set S of all fractions 


(3.3.3) 


H _ vA 4- M 
K iik + \k' 


in which X and fi are positive integers and (A., fi) = 1 . Thus h"/k" belongs 
to S. Every fraction of S lies between h/k and h'/k', and is in its lowest 
terms, since any common divisor of H and K would divide 


k([ih + kh') — h(fik + kk ') = k 


t After Theorem 31, h" /k" is the only term of between h/k and h'/k'; but we do not assume 
this in the proof. 
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h\nk + kk') - k\fxh + kh') = IX. 

Hence every fraction of S appears sooner or later in some and plainly 
the first to make its appearance is that for which K is least, i.e. that for 
which k = 1 and fx = l. This fraction must be h" /k", and so 

(3.3.4) h" = h + h\ k" = k + k'. 

If we substitute these values for h ", k" in (3.3.1), we see that r — s = 1. 
This proves Theorem 28 for The equations (3.3.4) are not generally 
true for three successive fractions of 3f n , but are (as we have shown) true 
when the central fraction has made its first appearance in 

3.4. Second proof of the theorems. This proof is not inductive, and 
gives a rule for the construction of the term which succeeds h/k in 
Since ( h,k ) = 1, the equation 

(3.4.1) kx — hy — 1 

is soluble in integers (Theorem 25). If x$, yo is a solution then 

*0 + rh, yo + rk 

is also a solution for any positive or negative integral r. We can choose r 
so that n — k < yo + rk ^ n. There is therefore a solution (x,y) of (3.4.1) 
such that 

(3.4.2) ( x,y ) = 1, 0 ^n — k<y^n. 

Since x/y is in its lowest terms, andy < n,x/y is a fraction of Also 

x _ h J_ h 
y k + ky > k’ 

so that x/y comes later in than h/k. If it is not h'/k', it comes later than 
h'/k', and 


x h! k'x — h!y 1 
y k' k'y ^ k'y ’ 
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while 

b! h _ kh' -hk' ^ 1 

IP k ~ kk' ^ kk r 

Hence 

1 kx — hy x h 1 1 k + y 

ky ky y k ^ k'y kk' kk'y 

n _1_ 

> kk'y ^ ky ’ 

by (3.4.2). This is a contradiction, and therefore x/y must be h'/k', and 
kh' — hk! = 1 . 

Thus, to find the successor of | in 3 13 , we begin by finding some solution (xo+o) of 
9x — 4 y = 1, e.g. *0 — 1. yo = 2. We then choose r so that 2 + 9 r lies between 
13 — 9 = 4 and 13. This gives r — 1 , x = 1 +4r = 5,y = 2 + 9r= 11, and the fraction 
required is . 

3.5. The integral lattice. Our third and last proof depends on simple 
but important geometrical ideas. 

Suppose that we are given an ori- 
gin O in the plane and two points P, Q 
not collinear with O. We complete 
the parallelogram OPQR, produce its 
sides indefinitely, and draw the two 
systems of equidistant parallels of 
which OP, QR and OQ, PR are con- 
secutive pairs, thus dividing the plane 
into an infinity of equal parallelo- 
grams. Such a figure is called a lattice 
(Gitter). 

A lattice is a figure of lines. It 
defines a figure of points, viz. the sys- 
tem of points of intersection of the 
lines, or lattice points. Such a system 
we call a point-lattice. 

Two different lattices may deter- 
mine the same point-lattice; thus in 
Fig. 1 the lattices based on OP, OQ 
and on OP, OR determine the same 
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system of points. Two lattices which determine the same point-lattice are 
said to be equivalent. 

It is plain that any lattice point of a lattice might be regarded as the origin 
O, and that the properties of the lattice are independent of the choice of 
origin and symmetrical about any origin. 

One type of lattice is particularly important here. This is the lattice which 
is formed (when the rectangular coordinate axes are given) by parallels to 
the axes at unit distances, dividing the plane into unit squares. We call 
this the fundamental lattice L, and the point-lattice which it determines, 
viz. the system of points (x,y) with integral coordinates, the fundamental 
point-lattice A. 

Any point- lattice may be regarded as a system of numbers or vectors, 
the complex coordinates x+iy of the lattice points or the vectors to these 
points from the origin. Such a system is plainly a modulus in the sense of 
§ 2.9. If P and Q are the points (jci^i) and (*2 ^ 2 ), then the coordinates of 
any point S of the lattice based upon OP and OQ are 

x = mx 1 + nx 2 , y = my x + ny 2 , 

where m and n are integers; or ifzi and Z 2 are the complex coordinates of 
P and Q, then the complex coordinate of S is 

z = mz[ + nz 2 . 


3.6. Some simple properties of the fundamental lattice. (1) We now 
consider the transformation defined by 

(3.6.1) x' — ax- V by, y' = cx + dy, 

where a, b, c, d are given, positive or negative, integers with ad — be 0. 
It is plain that any point (x,y) of A is transformed into another point (x',y') 
of A. 

Solving (3.6.1) for x andy, we obtain 

{'x f. “ by' ex' - ay' 

(36 * 2) X =^Vc' y = ~^d^- 

If 


(3.6.3) 


A = ad — be = ±1, 
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then any integral values of x' and y' give integral values of x and y, and 
every lattice point (. x',y ') corresponds to a lattice point (x,y). In this case 
A is transformed into itself. 

Conversely, if A is transformed into itself, every integral (x f ,y / ) must 
give an integral ( x,y ). Taking in particular (x',y') to be (1, 0) and (0, 1), 
we see that 


A| d, A|h, A|c, A|a, 


and so 


A 2 1 ad — be. A 2 1 A. 


Hence A = ±1. 

We have thus proved 

Theorem 32. A necessary and sufficient condition that the transforma- 
tion (3.6.1) should transform A into itself is that A = ±1. 

We call such a transformation unimodular. 

(2) Suppose now P = (a, c) and Q = (b, d) are points of A not collinear 
with O. The area of the parallelogram defined by OP and OQ is 

5 = ±(a*/ — be) — | ad — bc\, 

the sign being chosen to make S positive. The points ( x',y ') of the lattice 
A' based on OP and OQ are given by 

x' = xa 4 -yb, y = xc A- yd, 

where x and y are arbitrary integers. After Theorem 32, a necessary and 
sufficient condition that A' should be identical with A is that 5=1. 

Theorem 33. A necessary and sufficient condition that the lattice L’ 
based upon OP and OQ should be equivalent to L is that the area of the 
parallelogram defined by OP and OQ should be unity. 

(3) We call a point P of A visible (i.e. visible from the origin) if there 
is no point of A on OP between O and P. In order that (x,y) should be 
visible, it is necessary and sufficient that x/y should be in its lowest terms, 
or (x,y) = 1. 
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Theorem 34. Suppose that P and Q are visible points of A, and that 8 is 
the area of the parallelogram J defined by OP and OQ. Then 

(i) if 8 = 1, there is no point of A inside J; 

(ii) if 8 > 1, there is at least one point of A inside J, and, unless that 
point is the intersection of the diagonals of J, at least two, one in each of 
the triangles into which J is divided by PQ. 

There is no point of A inside J if and only if the lattice L' based on OP 
and OQ is equivalent to L, i.e. if and only if 8 = 1. If 8 > 1, there is at 
least one such point S. If R is the fourth vertex of the parallelogram J, and 
RT is parallel and equal to OS, but with the opposite sense, then (since the 
properties of a lattice are symmetrical, and independent of the particular 
lattice point chosen as origin) T is also a point of A, and there are at least 
two points of A inside J unless T coincides with S. This is the special case 
mentioned under (ii). 

The different cases are illustrated in Figs. 2a, 2b, 2c. 

3.7. Third proof of Theorems 28 and 29. The fractions h/k with 

0 ^ h ^ k ^ n, (h,/c) = 1 
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are the fractions of and correspond to the visible points (k, h) of A 
inside, or on the boundary of, the triangle defined by the lines y = 0, 
y = x,x = n. 

If we draw a ray through O and rotate it round the origin in the counter- 
clockwise direction from an initial position along the axis ofx, it will pass 
in turn through each point (k, h ) representative of a Farey fraction. If P and 
P' are points ( k , hi) and (A 7 , h ') representing consecutive fractions, there is 
no representative point inside the triangle OPP' or on the join PP' , and 
therefore, by Theorem 34, 


kti -hk' = 1. 

3.8. The Farey dissection of the continuum. It is often convenient to 
represent the real numbers on a circle instead of, as usual, on a straight 
line, the object of the circular representation being to eliminate integral 
parts. We take a circle C of unit circumference, and an arbitrary point 
O of the circumference as the representative of 0, and represent x by the 
point P x whose distance from O, measured round the circumference in the 
counter-clockwise direction, is x. Plainly all integers are represented by 
the same point O, and numbers which differ by an integer have the same 
representative point. 

It is sometimes useful to divide up the circumference of C in the 
following manner. We take the Farey series S„, and form all the mediants 

h + h f 

t 1 - k+k' 

of successive pairs h/k, h'/k'. The first and last mediants are 

0 -|- 1 _ 1 n — 1 4 - 1 n 

1+h n+V n + 1 n+1 

The mediants naturally do not belong themselves to 

We now represent each mediant p, by the point P^. The circle is thus 
divided up into arcs which we call Farey arcs, each bounded by two points 
Pfi and containing one Farey point, the representative of a term ofS„. Thus 

r_i__L.y 

\n +1 n + 1 / 

is a Farey arc containing the one Farey point O. The aggregate of Farey 
arcs we call the Farey dissection of the circle. 
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In what follows we suppose that n > 1. If Ph/k is a Farey point, and 
h\/ki, h 2 /k 2 are the terms of which precede and follow h/k, then the 
Farey arc round Ph/k is composed of two parts, whose lengths are 

h h + h\ 1 h + h2 h _ 1 

k k + k\ k(k + k \ ) ’ k + k2 k k(k + fo) 

respectively. Now k + k\ < 2n, since k and k\ are unequal (Theorem 31) 
and neither exceeds n; and k + k\ > n, by Theorem 30. We thus obtain 

Theorem 35. In the Farey dissection of order n, where n > 1, each part 
of the arc which contains the representative of h/k has a length between 

I and 

k(2n — 1) k{n + 1) 

The dissection, in fact, has a certain ‘uniformity’ which explains its 
importance. 

We use the Farey dissection here to prove a simple theorem concerning 
the approximation of arbitrary real numbers by rationals, a topic to which 
we shall return in Ch. XI. 


Theorem 36. If% is any real number, and n a positive integer, then there 
is an irreducible fraction h/k such that 


(3.8.1) 


0 < k ^ n. 






1 


k(n + 1) 


We may suppose that 0 < £ < 1 . Then £ falls in an interval bounded by 
two successive fractions of S„, say h/k and h'/k', and therefore in one of 
the intervals 


/ h h + h'\ /h + h' h'\ 
[k’k + k')’ \k + k'’k')' 


Hence, after Theorem 35, either h/k or h'/k' satisfies the conditions: h/k if 
£ falls in the first interval, h'/k' if it falls in the second. 

3.9. A theorem of Minkowski. If P and Q are points of A, P' and 
Q' the points symmetrical to P and Q about the origin, and we add to the 
parallelogram / of Theorem 34 the three parallelograms based on OQ, OP', 
on OP', OQ', and on OQ', OP, we obtain a parallelogram K whose centre 
is the origin and whose area 45 is four times that of/. If 5 has the value 1 (its 
least possible value) there are points of A on the boundary of K, but none, 
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except O, inside. If S > 1 , then there are points of A, other than O, inside 
K. This is a very special case of a famous theorem of Minkowski, which 
asserts that the same property is possessed, not only by any parallelogram 
symmetrical about the origin (whether generated by points of A or not), 
but by any ‘convex region’ symmetrical about the origin. 

An open region R is a set of points with the properties (1) if P belongs 
to R, then all points of the plane sufficiently near to P belong to R, (2) any 
two points of R can be joined by a continuous curve lying entirely in R. 
We may also express (1) by saying that any point of R is an interior point 
of R. Thus the inside of a circle or a parallelogram is an open region. The 
boundary C of R is the set of points which are limit points of R but do not 
themselves belong to R. Thus the boundary of a circle is its circumference. 
A closed region R* is an open region R together with its boundary. We 
consider only bounded regions. 

There are two natural definitions of a convex region, which may be 
shown to be equivalent. First, we may say that R (or R*) is convex if every 
point of any chord of R, i.e. of any line joining two points of R, belongs to 
R. Secondly, we may say that R (or R*) is convex if it is possible, through 
every point P of C, to draw at least one line / such that the whole of R 
lies on one side of /. Thus a circle and a parallelogram are convex; for the 
circle, / is the tangent at P, while for the parallelogram every line / is a side 
except at the vertices, where there are an infinity of lines with the property 
required. 

It is easy to prove the equivalence of the two definitions. Suppose first 
that R is convex according to the second definition, that P and Q belong to 
R, and that a point S of PQ does not. Then there is a point T of C (which 
may be S itself) on PS, and a line / through T which leaves R entirely on 
one side; and, since all points sufficiently near to P or Q belong to R, this 
is a contradiction. 

Secondly, suppose that R is convex according to the first definition and 
that P is a point of C; and consider the set L of lines joining P to points of 
R. If Y\ and Y 2 are points ofR, and Y is a point of Y\ Y 2 , then Y is a point of 
R and PY a line of L. Hence there is an angle APB such that every line from 
P within APB, and no line outside APB, belongs to L. If APB > n, then 
there are points D, E of R such that DE passes through P, in which case P 
belongs to R and not to C, a contradiction. Hence APB ^ n. If APB = n, 
then AB is a line /; if APB < n , then any line through P, outside the angle, 
is a line /. 

It is plain that convexity is invariant for translations and for magnific- 
ations about a point O. 
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A convex region R has an area (definable, for example, as the upper 
bound of the areas of networks of small squares whose vertices lie in R). 

Theorem 37. (Minkowski’s Theorem). Any convex region R symmet- 
rical about O, and of area greater than 4, includes points of A other 
than O. 

3.10. Proof of Minkowski’s theorem. We begin by proving, a simple 
theorem whose truth is ‘intuitive’. 

Theorem 38. Suppose that Ro is an open region including O, that Rp 
is the congruent and similarly situated region about any point P of A, 
and that no two of the regions Rp overlap. Then the area of Ro does not 
exceed 1. 

The theorem becomes ‘obvious’ when we consider that, if Ro were the 
square bounded by the lines x = ±^,y = then the area of Ro would 
be 1 and the regions Rp, with their boundaries, would cover the plane. We 
may give an exact proof as follows. 

Suppose that A is the area of Ro, and A the maximum distance of a point 
of C(f from O; and that we consider the (2 n+ 1) 2 regions Rp corresponding 
to points of A whose coordinates are not greater numerically than n. All 
these regions lie in the square whose sides are parallel to the axes and at a 
distance n + A from O. Hence (since the regions do not overlap) 

/ A - 1 

(2 n + 1) 2 A ^ (2 n + 2A) 2 , A ^ ( 1 + f 

\ n + 2 

and the result follows when we make n tend to infinity. 

It is to be noticed that there is no reference to symmetry or to convexity 
in Theorem 38. 

It is now easy to prove Minkowski’s theorem. Minkowski himself gave 
two proofs, based on the two definitions of convexity. 

(1 ) Take the first definition, and suppose that Ro is the result of contract- 
ing R about O to half its linear dimensions. Then the area of Ro is greater 
than 1, so that two of the regions Rp of Theorem 38 overlap, and there is 
a lattice-point P such that Ro and Rp overlap. Let Q (Fig. 3a) be a point 
common to Ro and Rp. If OQ' is equal and parallel to PQ, and Q" is the 
image of O' in O, then Q', and therefore Q", lies in Ro', and therefore, by 



t We use C systematically for the boundary of the corresponding R. 
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the definition of convexity, the middle point of QQ' lies in Ro- But this 
point is the middle point of OP; and therefore P lies in R. 

(2) Take the second definition, and suppose that there is no lattice point 
but O in R. Expand R* about O until, as R'*, it first includes a lattice point 
P. Then P is a point of C', and there is a line /, say through P (Fig. 3b). 
If Ro is R' contracted about O to half its linear dimensions, and lo is the 
parallel to / through the middle point of OP, then lo is a line / for Ro- It is 
plainly also a line / for Rp, and leaves Ro and Rp on opposite sides, so that 
Ro and Rp do not overlap. A fortiori Ro does not overlap any other Rp, 
and, since the area of Ro is greater than 1 , this contradicts Theorem 38. 

There are a number of interesting alternative proofs, of which perhaps 
the simplest is one due to Mordell. 

If R is convex and symmetrical about O, and P\ and Pj are points of R 
with coordinates (jq, yi) and (X 2 , yi), then (-X 2 , ~yi), and therefore the 
point M whose coordinates are \{x\ — X 2 ) and |(yi — y 2 ), is also a point 
of*. 

The lines x = 2 p/t, y = 2 qft, where Ms a fixed positive integer and 
p and q arbitrary integers, divide up the plane into squares, of area Alt 1 , 
whose comers are (2 p/t, 2q/t). If N(t) is the number of comers in R, and 
A the area of R, then plainly At~ 2 N(t) — ► A when / — ► oo; and if A > 4 
then N(t) > t 2 for large t. But the pairs (p, q) give at most t 2 different pairs 
of remainders when p and q are divided by t; and therefore there are two 
points P\ and P 2 of R, with coordinates 2p\!t, 2q\!t and 2p2/t, 2q2lt, such 
that p\ — p 2 and q\ — <72 are both divisible by t. Hence the point M, which 
belongs to R , is a point of A. 

3.11. Developments of Theorem 37. There are some further develop- 
ments of Theorem 37 which will be wanted in Ch. XXIV and which it is 
natural to prove here. We begin with a general remark which applies to all 
the theorems of §§ 3.6 and 3.9-10. 
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We have been interested primarily in the ‘fundamental’ lattice L (or A), 
but we can see in various ways how its properties may be restated as general 
properties of lattices. We use Lot A now for any lattice of lines or points. If 
it is based upon the points O, P, Q, as in § 3.5, then we call the parallelogram 
OPRQ the fundamental parallelogram of £ or A. 

(i) We may set up a system of oblique Cartesian coordinates with OP, 
OQ as axes, and agree that P and Q are the points (1,0) and (0, 1). The 
area of the fundamental parallelogram is then 

S — OP • OQ • sin to, 

where <o is the angle between OP and OQ. The arguments of § 3.6, 
interpreted in this system of coordinates, then prove 

Theorem 39. A necessary and sufficient condition that the transforma- 
tion (3.6.1) shall transform A into itself is that A = ±1. 

Theorem 40. IfP and Q are any two points of A, then a necessary and 
sufficient condition that the lattice L' based upon OP and OQ should be 
equivalent to L is that the area of the parallelogram defined by OP, OQ 
should be equal to that of the fundamental parallelogram of A. 

(ii) The transformation 

x' = ax 4- Py, / = yx + Sy 


(where now a, p, y, 8 are any real numbers)^ transforms the fundamen- 
tal lattice of § 3.5 into the lattice based upon the origin and the points 
(a, y), (P, 8). It transforms lines into lines and triangles into triangles. 
If the triangle P\Pj.Pi, where Pi is the point (x lt y,), is transformed into 
Q\QiQi, then the areas of the triangles are 


and 



X\ 

yx 

X2 

yi 

*3 

y 3 


<**l + Pyx yx 1 + 8 yi 1 

/— s 

X 

1 

$ 

'w-' 

— Ic 

-H 

II 

x\ y\ 1 

ax 2 + Py2 yx 2 + Sy2 1 

X2 yi l 

axs + Py 3 yx 3 + Sy 3 1 

Z. 

XI 73 1 


t The S of this paragraph has no connexion with the S of (i), which reappears below. 
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Thus areas of triangles are multiplied by the constant factor |a<5 — fiy |; and 
the same is true of areas in general, since these are sums, or limits of sums, 
of areas of triangles. 

We can therefore generalize any property of the fundamental lattice by 
an appropriate linear transformation. The generalization of Theorem 38 is 

Theorem 41. Suppose that A is any lattice with origin O, and that Ro 
satisfies ( with respect to A) the conditions stated in Theorem 38. Then the 
area of Ro does not exceed that of the fundamental parallelogram of A. 

It is convenient also to give a proof ab initio which we state at length, 
since we use similar ideas in our proof of the next theorem. The proof, on 
the lines of (i) above, is practically the same as that in § 3.10. 

The lines 


x = ±n, y = ±n 

define a parallelogram n of area 4n 2 S, with (2/H-l) 2 points P of A inside 
it or on its boundary. We consider the (2n+l) 2 regions Rp corresponding 
to these points. If A is the greatest value of |jc| or |y| on Co, then all these 
regions lie inside the parallelogram IT, of area 4 {n +A) 2 8, bounded by the 
lines 


x = ±(n+A), y = ±(n + A ); 


and 


(2n + 1) 2 A ^ 4(n +A) 2 8. 

Hence, making n — > oo, we obtain. 

A ^ 8 . 

We need one more theorem which concerns the limiting case A = 8. We 
suppose that Ro is a parallelogram; what we prove on this hypothesis will 
be sufficient for our purposes in Ch. XXIV. 

We say that two points (x,y) and (x',y') are equivalent with respect to 
L if they have similar positions in two parallelograms of L (so that they 
would coincide if one parallelogram were moved into coincidence with the 
other by parallel displacement). If L is based upon OP and OQ, and P and 
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Q are (x\, y\) and {x 2 , y 2 ), then the conditions that the points (x,y) and 
( x',y ') should be equivalent are that 

x' -x = rx\+ sx 2 , y'-y = ry l + sy 2 , 
where r and s are integers. 

Theorem 42. IfRo is a parallelogram whose area is equal to that of the 
fimdamental parallelogram of L, and there are no two equivalent points 
inside Ro, then there is a point, inside Ro or on its boundary, equivalent 
to any given point of the plane. 

We denote the closed region corresponding to Rp by R p . 

The hypothesis that/?o includes no pair of equivalent points is equivalent 
to the hypothesis that no two Rp overlap. The conclusion that there is a point 
of R* 0 equivalent to any point of the plane is equivalent to the conclusion 
that the Rp cover the plane. Hence what we have to prove is that, if A = 5 
and the Rp do not overlap, then the R p cover the plane. 

Suppose the contrary. Then there is a point Q outside all R p . This point 
Q lies inside or. on the boundary of some parallelogram of L, and there is a 
region D, in this parallelogram, and of positive area q outside all Rp; and 
a corresponding region in every parallelogram of L. Hence the area of all 
Rp, inside the parallelogram TV of area 4(/i + A) 2 S, does not exceed 

4(5 — q)(n +A + l) 2 . 

It follows that 


(2 n + 1) 2 5 ^ 4(5 - q)(n + A + l) 2 ; 
and therefore, making n —*■ oo, 


5 ^ 5 - rj, 

a contradiction which proves the theorem. 

Finally, we may remark that all these theorems may be extended to 
space of any number of dimensions. Thus if A is the fundamental point- 
lattice in three-dimensional space, i.e. the set of points (x,y, z) with integral 
coordinates, R is a convex region symmetrical about the origin, and of 
volume greater than 8, then there are points of A, other than O, in R. In n 
dimensions 8 must be replaced by 2”. We shall say something about this 
generalization, which does not require new ideas, in Ch. XXIV. 
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NOTES 

§ 3.1. The history of ‘Farey series’ is very curious. Theorems 28 and 29 seem to have 
been stated and proved first by Haros in 1802; see Dickson, History , i. 156. Farey did not 
publish anything on the subject until 1816, when he stated Theorem 29 in a note in the 
Philosophical Magazine. He gave no proof, and it is unlikely that he had found one, since 
he seems to have been at the best an indifferent mathematician. 

Cauchy, however, saw Farey ’s statement, and supplied the proof ( Exercices de mathema- 
tiques, i. 1 14-16). Mathematicians generally have followed Cauchy’s example in attributing 
the results to Farey, and the series will no doubt continue to bear his name. 

See Rademacher, Lectures in elementary number theory (New York, Blaisdell, 1964), 
for a fuller account of Farey series and Huxley, Acta Arith. 18 (1971), 281-7 and Hall, 
J. London Math. Soc. (2) 2 (1970), 139-48 for more details. 

§ 3.3. Hurwitz, Math. Annalen. 44 (1894), 417-36. Professor H. G. Diamond drew my 
attention to the incompleteness of our proof in earlier editions. 

§ 3.4. Landau, Vorlesungen, i. 98-100. 

§§ 3.5-7. Here we follow the lines of a lecture by Professor P61ya. 

§ 3.8. For Theorem 36 see Landau, Vorlesungen , i. 100. 

§ 3.9. The reader need not pay much attention to the definitions of ‘region’, ‘boundary’, 
etc., given in this section if he does not wish to; he will not lose by thinking in terms 
of elementary regions such as parallelograms, polygons, or ellipses. Convex regions are 
simple regions involving no ‘topological’ difficulties. That a convex region has an area was 
first proved by Minkowski ( Geometrie der Zahlen , Kap. 2). 

§ 3.10. Minkowski’s first proof will be found in Geometrie der Zahlen , 73-76, and 
his second in Diophantische Approximationen , 28-30. MordelFs proof was given in Com- 
posite Math. 1 (1934), 248-53. Another interesting proof is that by Haj6s, Acta Univ. 
Hungaricae (Szeged), 6 (1934), 224-5: this was set out in full in the first edition of this 
book. 
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IRRATIONAL NUMBERS 

4.1. Some generalities. The theory of ‘irrational number’, as explained 
in text books of analysis, falls outside the range of arithmetic. The theory 
of numbers is occupied, first with integers, then with rationals, as relations 
between integers, and then with irrationals, real or complex, of special 
forms, such as 

r + sJ2, r + sJ(,-5), 

where r and s are rational. It is not properly concerned with irrationals as 
a whole or with general criteria for irrationality (though this is a limitation 
which we shall not always respect). 

There are, however, many problems of irrationality which may be 
regarded as part of arithmetic. Theorems concerning rationals may be 
restated as theorems about integers; thus the theorem 

‘r 3 + s 3 = 3 is insoluble in rationals’ 

may be restated in the form 

' , a 3 d i + b 3 c 3 = 3 b 3 d 3 is insoluble in integers’: 

and the same is true of many theorems in which ‘irrationality’ intervenes. 
Thus 

(P) l y/2 is irrational’ 
means 

(Q) ‘a 2 = 2 b 2 is insoluble in integers’, 

and then appears as a properly arithmetical theorem. We may ask ‘is +J2 
irrational?’ without trespassing beyond the proper bounds of arithmetic, 
and need not ask ‘what is the meaning of *J2V We do not require any 
interpretation of the isolated symbol y/2, since the meaning of (P) is defined 
as a whole and as being the same as that of ( Q ).* 

In this chapter we shall be occupied with the problem 

‘is x rational or irrational?’, 

x being a number which, like -J2, e, or n, makes its appearance naturally 
in analysis. 


t In short J2 may be treated here as an ‘incomplete symbol’ in the sense of Principia Mathematica. 
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4.2. Numbers known to be irrational. The problem which we are con- 
sidering is generally difficult, and there are few different types of numbers 
x for which the solution has been found. In this chapter we shall confine 
our attention to a few of the simplest cases, but it may be convenient to 
begin by a rough general statement of what is known. The statement must 
be rough because any more precise statement requires ideas which we have 
not yet defined. 

There are, broadly, among numbers which occur naturally in analysis, 
two types of numbers whose irrationality has been established. 

(a) Algebraic irrationals. The irrationality of J2 was proved by 
Pythagoras or his pupils, and later Greek mathematicians extended the 
conclusion to J'b and other square roots. It is now easy to prove that 

yN 

is generally irrational for integral m and N. Still more generally, numbers 
defined by algebraic equations with integral coefficients, unless ‘obviously’ 
rational, can be shown to be irrational by the use of a theorem of Gauss. 
We prove this theorem (Theorem 45) in § 4.3. 

( b ) The numbers e and n and numbers derived from them. It is easy to 
prove e irrational (see § 4.7); and the proof, simple as it is, involves the 
ideas which are most fundamental in later extensions of the theorem, tv 
is irrational, but of this there is no really simple proof. All powers of e 
or 7T, and polynomials in e or n with rational coefficients, are irrational. 
Numbers such as 

e^ 2 , e^ s , yie 2 ^ 2 , log 2 

are irrational. We shall return to this subject in Ch. XI (§§ 11.13-14). 

It was not until 1929 that theorems were discovered which go beyond 
those of § § 1 1 . 1 3- 1 4 in any very important way. It has been shown recently 
that further classes of numbers, in which 

e n , 2^ 2 , e n ^ 2 , e n +7T 

are included, are irrational. The irrationality of such numbers as 

2 e , n e , 7t^ 2 , e + jt 
or ‘Euler’s constant’* y is still unproved. 

f + 2 +••• + * -tog")- 
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43. The theorem of Pythagoras and its generalizations. We shall 
begin by proving 

Theorem 43 (Pythagoras’ theorem), -J2 is irrational. 

We shall give two proofs of this theorem. The theorem and its sim- 
plest generalizations, though trivial now, deserve intensive study. The old 
Greek theory of proportion was based on the hypothesis that magnitudes of 
the same kind were necessarily commensurable, and it was the discovery 
of Pythagoras which, by exposing the inadequacy of this theory, opened 
the way for the more profound theory of Eudoxus which is set out in 
Euclid v. 

(i) First proof. If J2 is rational, then the equation 

(4.3.1) a 2 = 2b 2 

is soluble in integers a, b with (a, b) = 1. Hence b\a 2 and therefore p\a 2 
for any prime factor p of b. It follows that p\a. Since (a, b) = 1, this is 
impossible. Hence b = 1 and this also is clearly false. 

(ii) Second proof The traditional proof ascribed to Pythagoras runs as 
follows. From (4.3.1), we see that a 2 is even and therefore that a is even, 
i.e. a = 2c. Hence b 2 = 2c 2 and b is also even, contrary to the hypothesis 
that (a, b) = 1 . 

The two proofs are very similar but there is an important difference. In 
(ii) we consider divisibility by 2, a given number. Clearly, if 2|a 2 , then 2| a, 
since the square of an odd number is certainly odd. In (i), on the other hand, 
we consider divisibility by the unknown prime p and, in fact, we assume 
Theorem 3. Thus (ii) is the logically simpler proof, while, as we shall see 
in a moment, (i) lends itself more readily to generalization. 

We now prove the more general 

Theorem 44. tyN is irrational, unless N is the m-th power of an integer n. 

(iii) Suppose that 

(4.3.2) a m = Nb m , 

where (a, b) = 1 . Then b\a m , and p|a m for every prime factor p of b. Hence 
p\a, and from this it follows as before that b = 1 . It will be observed that 
this proof is almost the same as the first proof of Theorem 43. 
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(iv) To prove Theorem 44 for m = 2 without using Theorem 3, we suppose 
that 

y/N = a H — , 

c 

where a, b , c are integers, 0 < b < c and b/c is the fraction with least 
numerator for which this is true. Hence 


c 2 N = (ca + b) 2 = a 2 c 2 + 2 abc + b 2 


and so c\b 2 , i.e. b 2 = cd. Hence 


b d 

c b 

and 0 < d < b, a contradiction. It follows that y/N is integral or irrational. 
A still more general theorem is 

Theorem 45. Ifx is a root of an equation 

x m +c l x m ~ l +... + Cm = 0, 

with integral coefficients of which the first is unity, then x is either integral 
or irrational. 

In the particular case in which the equation is 

x™ — N = 0, 

Theorem 45 reduces to Theorem 44. 

We may plainly suppose that c m # 0. We argue as under (iii) above. 

If x = a/b, where (a, b) = 1, then 

a w + cia* , - 1 & + ... + c m b m = 0. 

Hence bla™, and from this it follows as before that b = 1. 

It is possible to prove Theorem 44 for general m and Theorem 45 also 
without using Theorem 3, but the argument is somewhat longer. 
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4.4. The use of the fundamental theorem in the proofs of Theorems 
43-45. It is important, in view of the historical discussion in the next 
section, to observe what use is made, in .the proofs of § 4.3, of the 
fundamental theorem of arithmetic or of the ‘equivalent’ Theorem 3. 

The critical inference, in the proof (iii) of Theorem 44, is 


‘p\a m -► p\a\ 

Here we use Theorem 3. The same remark applies to the first proof of 
Theorem 43, the only simplification being that m = 2. In these proofs 
Theorem 3 plays an essential part. 

The situation is different in the second proof of Theorem 43, since here 
we are considering divisibility by the special number 2. We need ‘2|a 2 — ► 
2|a’, and this can be proved by ‘enumeration of cases’ and without an 
appeal to Theorem 3. Since 

(2s + l) 2 = 4s 2 +4s+ 1, 


the square of an odd number is odd, as we remarked, and the conclusion 
follows. 

We can use a similar enumeration of cases to prove Theorem 44 for any 
special m and N. Suppose, for example, that m = 2, N = 5. We need 
‘5|a 2 — ► 5|a’. Now any number a which is not a multiple of 5 is of one 
of the forms 5m + 1, 5m + 2, 5m + 3, 5m + 4, and the squares of these 
numbers leave remainders 1, 4, 4, 1 after division by 5. 

If m = 2, N = 6, we argue with 2, the smallest prime factor of 6, and 
the proof is almost identical with the second proof of Theorem 43. With 
m = 2 and 


N = 2,3,5, 6,7,8, 10, 11, 12, 13, 14, 15, 17, 18, 

we argue with the divisors 

d = 2, 3, 5, 2, 7, 4, 2, 11,3, 13,2,3, 17,2, 

the smallest prime factors of N which occur in odd multiplicity or, in the 
case of 8, an appropriate power of this prime factor. It is instructive to work 
through some of these cases; it is only when N is prime that the proof runs 
exactly according to the original pattern, and then it becomes tedious for 
the larger values of N. 

We can deal similarly with cases such as m = 3, N = 2, 3, or 5; but we 
confine ourselves to those which are relevant in §§ 4.5-6. 
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4.5. A historical digression. It is unknown when, or by whom, the 
‘theorem of Pythagoras’ was discovered. ‘The discovery’, says Heath,* 
‘can hardly have been made by Pythagoras himself, but it was certainly 
made in his school.’ Pythagoras lived about 570-490 b.c. Democritus, 
bom about 470, wrote ‘on irrational lines and solids’, and ‘it is difficult 
to resist the conclusion that the irrationality of y/2 was discovered before 
Democritus’ time’. 

It would seem that no extension of the theorem was made for over fifty 
years. There is a famous passage in Plato’s Theaetetus in which it is stated 
that Theodorus (Plato’s teacher) proved the irrationality of 

■v/3, V5, • • • > 

‘taking all the separate cases up to the root of 1 7 square feet, at which point, 
for some reason, he stopped’. We have no accurate information about this 
or other discoveries of Theodorus, but Plato lived 429—348, and it seems 
reasonable to date this discovery about 410-400. 

The question how Theodorus proved his theorems has exercised the 
ingenuity of every historian. It would be natural to conjecture that he used 
some modification of the ‘traditional’ method of Pythagoras, such as those 
which we discussed in the last section. In that case, since he cannot have 
known the fundamental theorem,* and it is unlikely that he knew even 
Euclid’s Theorem 3, he may have argued much as we aigued at the end 
of § 4.4. The objections to this (made by historians such as Zeuthen and 
Heath) are (i) that it is so obvious an adaptation of the proof for J2 that it 
would not be regarded as new and (ii) that it would be clear, long before 
yj\l was reached, that it was generally applicable. Against this, however, 
it is fair to remark that Theodorus would have to consider each different 
d anew and that the work would become notably laborious at +J\\, -v/13, 
and y/\l (and behind J\1 lurk ^/19 and v^23). 

There are, however, two other hypotheses as to Theodorus’ method of 
proof. These methods become notably more complicated, one at J\1 and 
the other at y/\9. Which of these is to be preferred depends on the exact 
meaning of the Greek word %pi, translated as ‘up to’ by Heath; does 
it mean ‘up to but not including’ or ‘up to and including’ (the American 
usage of ‘through’)? Classical scholars tell me that the former is the more 

' Sir Thomas Heath, A manual of Greek mathematics, 54-55. In what follows passages in inverted 
commas, unless attributed to other writers, are quotations from this book or from the same writer’s 
A history of Greek mathematics. 

t See Ch. XII, § 12.5, for some further discussion of this point. 
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probable and, if so, the following method, proposed by McCabe, is a 
very likely one. It has the merit of depending essentially on the distinction 
between odd and even, a matter of great importance in Greek mathematics. 

Considering JN for successive values of N, Theodorus could ignore 
N = 4 n, since he would already have dealt with Jn. The other even values 
of N take the form 2(2/i+l) and the proof for *J2 extends to this at once. 
We have therefore only to consider odd N. For such N, if = a/b and 
(a, b) = 1, we have Nb 2 = a 2 and a and b must both be odd. We write a = 
2/1+1 and b = 25+1 and so obtain 


N(2A+ l) 2 = (25+ l) 2 . 


The number N must be of one of the forms 


4n + 3, 8/1 + 5, 8/i + 1 . 

If AT = 4/i + 3, we multiply out, divide by 2 and obtain 

8nA(A + 1) + 6 A(A + 1) + 2/i + 1 = 25(5 + 1), 

an impossibility, since one side is odd and the other even. If N = 8n + 5, 
we again multiply out, divide by 4 and have 

8nA(A + 1) + 5 A(A + 1) + 2« + 1 = 5(5 + 1), 

again impossible, since A (A + 1) and 5(5 + 1) are each even. 

There remain the numbers of the form 8/» + 1, which are 1,9, 17, 

Of these, 1 and 9 are trivial and a difficulty first arises at N = 17. Arguing 
as before, we reach the equation 

17(5 2 + B) + 4 = A 2 + A, 

both sides being even. We have then to consider a variety of possibilities 
and the whole problem becomes much more complicated. (The reader may 
care to try them.) Hence, if this were Theodorus’ method, he would very 
naturally stop just short of ^/ll. 

Zeuthen suggests an interesting method involving ratios which after a 
few transformations begin to cycle endlessly, thus leading to a proof by 
contradiction. This works well up to and including 1 7, while 1 8 is of course 
trivial, but 19 requires 8 ratios before an endless chain begins. We give his 
proof for y/5 in § 4.6. But, even if /xe/pi, means ‘up to and including’ in 
this passage, Plato might more reasonably have said ‘up to and including 
18’. On balance, McCabe’s conjecture seems the most plausible. 
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4.6. Geometrical proof of the irrationality of The proofs sug- 
gested by Zeuthen vary from number to number, and the variations depend 
at bottom on the form of the periodic continued fraction^ which represents 
y/N. We take as typical the simplest case (N = 5). 

We argue in terms of 

x = |(V5 - 1). 

Then 

x 2 = 1 — x. 


Geometrically, \iAB = 1, AC = x, then 


AC 2 = AB . CB 


A Ci Q C 2 C B 

I 1 1 1 1 1 

Fig. 4. 

and AB is divided ‘in golden section’ by C. These relations are fund- 
amental in the construction of the regular pentagon inscribed in a circle 
(Euclidiv.il). 

If we divide 1 by at, taking the largest possible integral quotient, viz. 1,* 
the remainder is 1 — x = x 2 . If we divide x by jc 2 , the quotient is again 1 
and the remainder is x — x 2 = x 3 . We next divide x 1 by x 3 , and continue 
the process indefinitely; at each stage the ratios of the number divided, the 
divisor, and the remainder are the same. Geometrically, if we take CCi 
equal and opposite to CB, CA is divided at C 1 in the same ratio as AB at C, 
i.e. in golden section; if we take C\ C 2 equal and opposite to C\A, then Ci C 
is divided in golden section at C 2 ; and so onJ' Since we are dealing at each 
stage with a segment divided in the same ratio, the process can never end. 

It is easy to see that this contradicts the hypothesis of the rationality of 
x. If x is rational, then AB said AC are integral multiples of the same length 
1 $, and the same is true of 

CiC = CB = AB - AC, C 1 C 2 = AC\ =^fC-CiC ,..., 

i.e. of all the segments in the figure. Hence we can construct an inf- 
inite sequence of descending integral multiples of 5, and this is plainly 
impossible. 

t SeeCh.X, § 10 . 12 . 
t Since j < x < 1 . 

^ C2C3 equal and opposite to C^C, C3C4 equal and opposite to C3C1, The new segments 

defined are measured alternately to the left and the right. 
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4.7. Some more irrational numbers. We know, after Theorem 44, 
that y/1, 1/2, ^/ll, ... are irrational. After Theorem 45, x = V 2 + \/ 3 is 
irrational, since it is not an integer and satisfies 


x 4 — lOx 2 + 1=0. 


We can construct irrationals freely by means of decimals or continued 
fractions, as we shall see in Chs. IX and X; but it is not easy, without 
theorems such as we shall prove in §§ 11.1 3-14, to add to our list many of 
the numbers which occur naturally in analysis. 

Theorem 46. logio 2 is irrational. 

This is trivial, since 

logio 2 = 1 

involves 2 b = 10°, which is impossible. More generally log„ m is irrational 
if m and n are integers, one of which has a prime factor which the other 
lacks. 

Theorem 47. e is irrational. 

Let us suppose e rational, so that e 
k ^ b and 

then b\k\ and a is an integer. But 

0 < Qt = + + . . . 

*+1 (*+l)(* + 2) 

1 1 _ 1 
< &+l~*”(&+l) 2 ~*~ k 

and this is a contradiction. 

In this proof, we assumed the theorem false and deduced that a was 
(i) integral, (ii) positive, and (iii) less than one, an obvious contradiction. 
We prove two further theorems by more sophisticated applications of the 
same idea. 

For any positive integer n, we write 


— a/b where a and b are integers. If 



f =/w — — -a £*■*■• 

m—n 
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where the c m are integers. For 0 < x < 1, we have 

(4.7.1) 0 <f(x) < - 

n\ 

Again /(0) = 0 and /^( 0) = 0 if m < n or m > 2n. But, if n < m < 2n, 

/ (m) (0) = 

nl 

an integer. Hence / (x) and all its derivatives take integral values at x = 0. 
Since /(I — x) =f (x), the same is true at x = 1 . 

Theorem 48. e y is irrational for every rational y f 0. 

Ify = h/k and e y is rational, so is = e h . Again, if e~ h is rational, so 
is eh. Hence it is enough to prove that, if h is a positive integer, e h cannot 
be rational. Suppose this false, so that e h = afb where a, b are positive 
integers. We write 

Fix) = h 2n f(x) - h 2n ~ l f'(x) + . . . - hf (2n ~ l) ix) + / (2w) (x), 
so that F(0) and F( 1 ) are integers. We have 

^-{e^Fix)} = e^ihFix) + F'(x)} = h 2n+l e hx fix). 

Hence 

l 

b J h 2n+l e^f ix)dx = 6[e /w F(x)]J = aF(l) - bFiO), 
o 

an integer. But, by (4.7.1), 

0 < b J h 2nJrX ^fix)dx < ^ — < 1 
o 

for large enough n, a contradiction. 

Theorem 49. rt and it 2 are irrational. 
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Suppose n 2 rational, so that n 2 
We write 


a/b, where a, b are positive integers. 


G(x) = 

b n \n 2n f (x) - n 2n ~ 2 f"(x) + j r 2n ~y 4 \x) + (-DY^wj , 

so that G(0) and G(l) are integers. We have 


— {G'(x) sin n x — nG(x) cos nx) 
ax 

= {G"(x) + n 2 G(x)} sinTrx = b n n 2n+2 f(x) simr* 
= n 2 a n sin nxf(x). 


Hence 


f „ . , \G'(x) sin nx _ "l 1 

n J a sin nx f ( x)dx = G(x) cos ttjcJ 


— G(0) + G(l), 


an integer. But, by (4.7.1), 


0 < n 


l 

J a n sin nx f ( x)dx 
o 


< 


na n 

~nT 


< 1 


for large enough n, a contradiction. 


NOTES 

§ 4.2. The irrationality of e and n was proved by Lambert in 1761; and that of e n by 
Gelfond in 1929. See the notes on Ch. XI. 

§§ 4.3-6. A reader interested in Greek mathematics is referred to Heath’s books men- 
tioned on p. 42, to van der Waerden, Science awakening (Gronnigen, Nordhoff, 1954) and 
to Knorr, Evolution of the Euclidean elements (Boston, Reidel, 1975). See McCabe, Math . 
Mag. 49 (1976), 201-3 for his conjecture as to Theodorus’ method of proof. 

We do not give specific references, nor attempt to assign Greek theorems to their real 
discoverers. Thus we use ‘Pythagoras’ for ‘some mathematician of the Pythagorean school’. 

§ 4.3. Sir Alexander Oppenheim found the proof (iv) of Theorem 44 (improved by 
Prof. R. Rado) and the corresponding proof of Theorem 45 referred to at the end of § 4.3. 
Theorem 45 is proved, in a more general form, by Gauss, DA ., § 42. 
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§ 4.7. Our proof of Theorem 48 is based on that of Hermite ( (Euvres , 3, 154) and our 
proof of Theorem 49 on that of Niven (Bulletin Amer. Math . Soc. 53 (1947), 509). 

By Theorem 49 


co . 2 

1 * 


n= 1 


4 

is irrational, and by Theorem 20S, f (4) = ^ is also irrational, as are the values of £(m) 
for all even positive integers m. However when m is odd much less is known. Ap6ry 
(1978) showed that f (3) is irrational; for a short proof see Beukers (Bull. London Math. 
Soc. 1 1 (1979), 268-72). It is still unknown if f (5) is irrational. However Ball and Rivoal 
(Inventiones Math. 146(2001), 193-207) proved that the sequence f (3), f(5), f (7), f(9),... 
contains infinitely many irrational numbers. 
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CONGRUENCES AND RESIDUES 

5.1. Highest common divisor and least common multiple. We have 
already defined the highest common divisor (a, b) of two numbers a and 
b. There is a simple formula for this number. 

We denote by min(x,>0 and max(x,y) the lesser and the greater of x and 
y. Thus min(l, 2) = 1, max(l, 1) = 1. 

Theorem 50. If 

a = UP* (« > 0)^ 

p 

and 

b = ft? (fi > 0 ), 

p 

then 

(a, b) = 

p 

This theorem is an immediate consequence of Theorem 2 and the 
definition of (a, b ). 

The least common multiple of two numbers a and b is the least positive 
number which is divisible by both a and b. We denote it by { a , b}, so that 

a\{a,b), b\{a,b), 

and {a, b } is the least number which has this property. 

t The symbol 

Y\np) 

P 

denotes a product extended over all prime values of p. The symbol 

n/o» 

p\m 

denotes a product extended over all primes which divide m. In the first formula of Theorem 50, a is 
zero unless p\a (so that the product is really a finite product). We might equally well write 

a = Y\p a . 

p\a 


In this case every a would be positive. 
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Theorem 51. In the notation of Theorem 50, 

[a,b] = 

P 

From Theorems 50 and 5 1 we deduce 
Theorem 52: 


{ a,b } 


ab 

{a, b ) ‘ 


If (a, b) = 1, a and b are said to be prime to one another or coprime. 
The numbers a,b,c,..., k are said to be coprime if every two of them are 
coprime. To say this is to say much more than to say that 


0 a,b,c,...,k ) = 1, 


which means merely that there is no number but 1 which divides all of 
a,b,c,...,k. 

We shall sometimes say that ‘a and b have no common factor’ when we 
mean that they have no common factor greater than 1, i.e. that they are 
coprime. 

5.2. Congruences and classes of residues. If m is a divisor of x — a, 
we say that x is congruent to a to modulus m, and write 

x = a (mod m). 


The definition does not introduce any new idea, since ‘x = a (mod m)’ and 
‘m|x — a’ have the same meaning, but each notation has its advantages. We 
have already used the word ’modulus’ in a different sense in § 2.9, but the 
ambiguity will not cause any confusion.^ 

By x ^ a (mod m) we mean that x is not congruent to a. 

If x = a (mod m), then a is called a residue of x to modulus m. If 
0 ^ a < m — 1, then a is the least residue * of x to modulus m. Thus two 
numbers a and b congruent (mod m) have the same residues (mod m). A 
class of residues (mod m) is the class of all the numbers congruent to a given 

t The dual use has a purpose because the notion of a ‘congruence with respect to a modulus of 
numbers' occurs at a later stage in the theory, though we shall not use it in this book, 
i Strictly, least non-negative residue. 
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residue (mod m), and every member of the class is called a representative 
of the class. It is clear that there are in all m classes, represented by 


0 , 1 , 2 , . . . ,m — 1 . 


These m numbers, or any other set of m numbers of which one belongs to 
each of the m classes, form a complete system of incongruent residues to 
modulus m, or, more shortly, a complete system (mod m). 

Congruences are of great practical importance in everyday life. For 
example, ‘today is Saturday’ is a congruence property (mod 7) of the num- 
ber of days which have passed since some fixed date. This property is 
usually much more important than the actual number of days which have 
passed since, say, the creation. Lecture lists or railway guides are tables of 
congruences; in the lecture list the relevant moduli are 365, 7, and 24. 

To find the day of the week on which a particular event falls is to solve a 
problem in ‘arithmetic (mod 7) ’.In such an arithmetic congruent numbers 
are equivalent, so that the arithmetic is a strictly finite science, and all 
problems in it can be solved by trial. Suppose, for example, that a lecture is 
given on every alternate day (including Sundays), and that the first lecture 
occurs on a Monday. When will a lecture first fall on a Tuesday? If this 
lecture is the (x + l)th then 


2x = 1 (mod 7); 

and we find by trial that the least positive solution is 

x = 4. 

Thus the fifth lecture will fall on a Tuesday and this will be the first that 
will do so. 

Similarly, we find by trial that the congruence 

x 2 = 1 (mod 8) 
has just four solutions, namely 


x = 1,3, 5, 7 (mod 8). 


It is sometimes convenient to use the notation of congruences even when 
the variables which occur in them are not integers. Thus we may write 


x = y (mod z) 
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whenever x — y is an integral multiple of z, so that, for example, 

\ = |(mod 1), — 7T = 7r(mod 2 tx ). 

S3. Elementary properties of congruences. It is obvious that con- 
gruences to a given modulus m have the following properties: 

(i) a = b -► b = a, 

(ii) a = b.b = c—^a = c, 

(iii) a = tf.b = b f ^-a + b=:a' + b'. 

Also, if a == a', b = b', . . . we have 

(iv) ka + lb + . . . = ka' + lb' + . . . , 

(v) a 2 — a' 2 , a 3 = a' 3 , 

and so on; and finally, if <t>(a,b,...) is any polynomial with integral 
coefficients, we have 

(vi) 0(a, b , . . .) = b', . . .). 

Theorem 53. If a = b (mod m) and a = h(mod n), then 

a = b (mod{w,n}). 

In particular, if (m, n) = 1, then 

a = b (mod mn ), 

This follows from Theorem 50. If p c is the highest power of p which 
divides {m, n}, then p c \m or p c \n and so p c \{a - b). This is true for every 
prime factor of {m, n}, and so 

a = b (mod {m,n)). 

The theorem generalizes in the obvious manner to any number of 
congruences. 

5.4. Linear congruences. The properties (i)-(vi) are like those of 
equations in ordinary algebra, but we soon meet with a difference. It is 
not true that 



ka = ka' 
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2 . 2 = 2 . 4 (mod 4), 
but 

2 # 4 (mod 4). 

We consider next what is true in this direction. 

Theorem 54. If(Jc, m) = d, then 

ka = &a'(mod m) a = a' ^mod ^ , 

conversely. 

Since {k, m ) = d, we have 

k = k\d, m — m\d, (k\,m\) = 1. 

Then 

ka — ka! k\(a — o') 
m m\ 

and, since (k\, m\) = 1, 

m\ka — ka' = m\\a — a 7 .* 

This proves the theorem. A particular case is 
Theorem 55. If(Jk,m) = 1, then 

ka = &a'(mod m) a = a' (mod m) 

and conversely. 

Theorem 56. If a\,a 2 , ... ,a m is a complete system of incongruent 
residues ( mod m) and ( k,m ) = 1, then ka\, kaj, . ■ ka m is also such 
a system. 

For kat — kaj = 0 (mod m) implies a/ — =0 (mod m), by 

Theorem 55, and this is impossible unless i = j. More generally, if 

t *=’ is the symbol of logical equivalence: if P and Q are propositions, then P = Q if P -* Q and 
Q-+P. 
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( k,m ) = 1, then 

ka r + 1 (r = 1 , 2, 3, . . . , m) 

is a complete system of incongruent residues (mod m). 

Theorem 57. If{k,m ) = d, then the congruence 

(5.4.1) kx = / (mod m) 

is soluble if and only if d\l. It has then just d solutions. In particular, if 
(k, m) = 1, the congruence has always just one solution. 

The congruence is equivalent to 

kx — my = /, 

so that the result is partly contained in Theorem 25. It is naturally to be 
understood, when we say that the congruence has ‘just d' solutions, that 
congruent solutions are regarded as the same. 

lid = 1, then Theorem 57 is a corollary of Theorem 56. If d > 1, the 
congruence (5.4.1) is clearly insoluble unless d\l. If d\l, then 

m = dm', k = dk\ l = dl', 

and the congruence is equivalent to 

(5.4.2) k'x = /'(mod m'). 

Since ( k m') = 1, (5.4.2) has just one solution. If this solution is 

x = t (mod m), 

then 

x = t +ym', 

and the complete set of solutions of (5.4.1) is found by giving y all values 
which lead to values of 1 4- ym! incongruent to modulus m. Since 

t +ym' = t 4- zm'(mod m) = m\m'(y — z) = d\(y — z), 

there are just d solutions, represented by 

t, t + tm! , t + 2m ! , . . . , t + (d — l)m! . 


This proves the theorem. 
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5.5. Euler’s function 0(m). We denote by </> (m) the number of positive 
integers not greater than and prime to m, that is to say the number of integers 
n such that 

0 < n < m, (n,m) = l.t 

If a is prime to m, then so is any number x congruent to a (mod m). There 
are 4> (m) classes of residues prime to m, and any set of0(m) residues, one 
from each class, is called a complete set of residues prime to m. One such 
complete set is the set of <p(m) numbers less than and prime to m. 

Theorem 58. If a\, ai,..., a$( m ) is a complete set of residues prime to 
m, and ( k,m ) = 1, then 


^ 2 » • • • , tolffjn) 


is also such a set. 

For the numbers of the second set are plainly all prime to m, and, as in 
the proof of Theorem 56, no two of them are congruent. 

Theorem 59. Suppose that (m, m') = 1 , and that a runs through a 
complete set of residues (mod m), and a! through a complete set of 
residues (mod m!). Then a'm + am' runs through a complete set of residues 
(mod mm!). 

There are mm' numbers a'm + am! . If 

a\m + a\m! = a 2 m + 02 m' (mod mm'). 


then 


a\m! = aim ! (mod m). 


and so 


a\ = 02 (mod m); 


and similarly 


a\ = a' 2 (mod m'). 

Hence the mm' numbers are all incongruent and form a complete set of 
residues (mod mm'). 


t n can be equal to m only when n — 1. Thus 0(1) = 1. 
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A function / (m) is said to be multiplicative if (m, m ') = 1 implies 

f(mm!) =f(m)f(m'). 

Theorem 60. <f>(n) is multiplicative. 

If (m, m') = 1, then, by Theorem 59, a'm + am' runs through a complete 
set (mod mm') when a and a' both run through complete sets (mod m) and 
(mod m') respectively. Also 

(a'm + am', mm') = 1 = (a'm + am', m) = 1 . (a'm + am', m') = 1 

= (am', m) = 1 . (a'm, m') = 1 
= (a, m) = 1 . (a', m!) = 1. 

Hence the <f> (mm') numbers less than and prime to mm' are the least positive 
residues of the 0(m)0(m') values of a'm + am' for which a is prime to m 
and a' to m'; and therefore 

0(mm') = 0(m)0(m'). 

Incidentally we have proved 

Theorem 61. If (m, m') = 1, a runs through a complete set of residues 
prime to m, and a ' through a complete set of residues prime to m', then 
am' + a'm runs through a complete set of residues prime to mm'. 

We can now find the value of <p(m) for any value of m. By Theorem 60, 
it is sufficient to calculate 0(m) when m is a power of a prime. Now there 
are p c — 1 positive numbers less than p c , of which p c ~ l — 1 are multiples 
of p and the remainder prime to p. Hence 

<Hp c ) =P c - 1 - - 1) =p c A - i) ; 

and the general value of <p(m) follows from Theorem 60. 

Theorem 62. If m — It p c , then 


<p(m) = m 



We shall also require 
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Theorem 63: 


= m. 

d\m 


If m = Tlp c , then the divisors of m are the numbers d = n p c> , where 
0 ^ d ^ c for each p; and 

®(m ) = ^2 n^ c ^ 

d\m p„d 

= Y\{i+<t>(p)+<t>(i> 2 ) + ---+<t>(p c )}, 

p 

by the multiplicative property of But 

1 + 0(p) + • • • + <f>(p c ) = 1 + (p - 1) +p(p - 1) + • 

+ P C-1 (P ~ 0 =P C » 


so that 


d>(m) = Y\p c = m- 

p 

5.6. Applications of Theorems 59 and 61 to trigonometrical sums. 
There are certain trigonometrical sums which are important in the theory 
of numbers and which are either ‘multiplicative’ in the sense of § 5.5 or 
possess very similar properties. 

We write* 


e(r) = e 2nn : 

we shall be concerned only with rational values of x. It is clear that 

when m = m' (mod n). It is this property which gives trigonometrical sums 
their arithmetical importance. 


t Throughout this section is the exponential function = 1 + f H of the complex variable 

f . We assume a knowledge of the elementary properties of the exponential function. 
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(1) Multiplicative property of Gauss’s sum. Gauss’s sum, which is 
particularly important in the theory of quadratic residues, is 


n- 1 n-l /i,2_,\ 

S(m,n) = Y d e 2nihlm/n = ( ) • 

h = 0 h = 0 ' H ' 

Since 

(h + m) 2 m) ( h 2 m\ 

—v—\ = e {—) 

for any r, we have 



whenever h\ = /12 (mod /i). We may therefore write 

S(m,n) = y^g 

A(n) 

the notation implying that A runs through any complete system of residues 
mod n. When there is no risk of ambiguity, we shall write h instead of h(n). 

Theorem 64. If (n, n') = 1, then 

S(m,nn) = S(mn',n)S(mn,n'). 

Let h,h! run through complete systems of residues to modulus n,n' 
respectively. Then, by Theorem 59, 




H = hn' + h'n 


runs through a complete set of residues to modulus nn! . Also 

mH 2 = m(hn' + h'n) 2 = mh 2 n' 2 + mh' 2 n 2 { mod nn'). 
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S(mn\ n)S(mn 9 n') 


?'(")) 

/ h 2 mn ' h' 2 mn\ 

5 *(— *— ) 

^ ( m(h 2 n 2 + h' 2 n 2 )\ 

5 * 1 — - — ) 

?•(£)- 


S(m, nn'). 


(2) Multiplicative property of Ramanujan s sum. Ramanujan’s sum is 


Cq(m) = 



the notation here implying that h runs only through residues prime to q. We 
shall sometimes write h instead of h* {q ) when there is no risk of ambiguity. 

We may write c q (m) in another form which introduces a notion of more 
general importance. We call p a primitive q-th root of unity if p q = 1 but 
p r is not 1 for any positive value of r less than q. 

Suppose that p q = 1 and that r is the least positive integer for which 
p r — 1 . Then q — kr + s, where 0 ^ s < r. Also 

p s = p *-kr= 1 , 


so that 5 = 0 and r\q. Hence 

Theorem 65. Any q-th root of unity is a primitive r-th root, for some 
divisor r of q. 

Theorem 66. The q-th roots of unity are the numbers 
e(^) ih = Q,\,...,q-\), 

and a necessary and sufficient condition that the root should be primitive 
is that h should be prime to q. 
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We may now write Ramanujan’s sum in the form 

Cq{m) = T,p m , 

where p runs through the primitive qih roots of unity. 
Theorem 67. If (q,q f ) = 1, then 

Cqqfm) ~ Cq(m)c q '(m). 



by Theorem 61. 

(3) Multiplicative property of Kloosterman ’s sum. Kloosterman’s sum 
(which is rather more recondite) is 

x l uh + V M 

S(u,v,n ) = 2Ze ^ — J , 

where h runs through a complete set of residues prime to n, and h is 
defined by 

hh = l(mod n ). 

Theorem 57 shows us that, given any h, th'ere is a unique h (mod n) which 
satisfies this condition. We shall make no use of Kloosterman’s sum, but 
the proof of its multiplicative property gives an excellent illustration of the 
ideas of the preceding sections. 

Theorem 68. If {n, n') = 1, then 

S(u,v,n)S(u,v',n f ) = S(u, V,nn'), 


where 
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hh = 1 (mod n), h'h' = 1 (mod n) 


then 

S(u, v,n)S(u,v',ri) 


(5.6.1) 


where 



H = hn' + h'n, K = vhn' + v'h'n. 


By Theorem 61, H runs through a complete system of residues prime to 
nrt! . Hence, if we can show that 

(5.6.2) K = VH ( mod nn'). 


where H is defined by 


HH = l(mod nn'), 

then (5.6.1) will reduce to 

S(u,v,n)S(u,v',ri ) = - S(u, V,nn'). 


Now 


(W + h'n)H = HH= 1 (mod nn'). 


Hence 


hn H = l(mod n), n'/f = hhn’H = h (mod n). 
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and so 

(5.6.3) ri 2 H = n'h (mod nn'). 

Similarly we see that 

(5.6.4) n 2 H = n'h' (mod nn'); 
and from (5.6.3) and (5.6.4) we deduce 

VH = (vn' 2 + v'n 2 )H = vn'h' + v'nh' = K (mod nn'). 

This is (5.6.2), and the theorem follows. 

5.7. A general principle. We return for a moment to the argument 
which we used in proving Theorem 65. It will avoid a good deal of repeti- 
tion later if we restate the theorem and the proof in a more general form. We 
use P(a) to denote any proposition asserting a property of a non-negative 
integer a. 

Theorem 69. If 

(i ) P(a) and P(b) imply P(a + b) and P(a — b), for every a and b 
(provided, in the second case, that b ^ a), 

(ii) r is the least positive integer for which P(r) is true, then 

(a) P ( kr ) is true for every non-negative integer k, 

(b) any q for which P(q) is true is a multiple of r. 

In the first place, (a) is obvious. 

To prove (b) we observe that 0 < r ^ q, by the definition of r. Hence 
we can write 

q = kr + s, s = q — kr, 

where k ^ 1 and 0 ^ s < r. But P(r) -> P(kr), by (a), and 

P(q) . P(kr) P(s), 

by (i). Hence, again by the definition of r, s must be 0, and q = kr. 

We can also deduce Theorem 69 from Theorem 23. In Theorem 65, P(a) 
is p a = 1. 
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5.8. Construction of the regular polygon of 17 sides. We conclude 
this chapter by a short excursus on one of the famous problems of elemen- 
tary geometry, that of the construction of a regular polygon of n sides, or 
of an angle a = 2 n/n. 

Suppose that («i , «2) = 1 and that the problem is soluble for n = n\ and 
for n = « 2 . There are integers r\ and r 2 such that 

r\n\ + r2«2 = 1 
or 

27r In In 

r i<X 2 + r 20 t\ = r\ h n — = . 

«2 n\ n\ni 

Hence, if the problem is soluble for n = n\ and n = « 2 , it is soluble for 
n = n\ri 2 - It follows that we need only consider cases in which n is a power 
of a prime. In what follows we suppose n — p prime. 

We can construct a if we can construct cos a (or sin a); and the numbers 

cos ka -|- /' sin ka {k = 1,2, 1) 


are the roots of 

(5.8.1) = x*- 1 + x”- 2 + ... + 1=0. 

x — 1 

Hence we can construct a if we can construct the roots of (5.8.1). 

‘Euclidean’ constructions, by ruler and compass, are equivalent analyt- 
ically to the solution of a series of linear or quadratic equations.^ Hence 
our construction is possible if we can reduce die solution of (5.8.1) to that 
of such a series of equations. 

The problem was solved by Gauss, who proved (as we stated in § 2.4) 
that the reduction is possible if and only if n is a ‘Fermat prime’* 

n = p = 2 lh + 1 = Fh. 

The first five values of h, viz. 0, 1, 2, 3, 4, give 

n = 3, 5, 17, 257, 65537, 

all of which are prime, and in these cases the problem is soluble. 

The constructions for n = 3 and n = 5 are familiar. We give here the 
construction for n = 17. We shall not attempt any systematic exposition 


t See § 11.5. 


* See § 2.5. 
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of Gauss’s theory; but this particular construction gives a fair example of 
the working of his method, and should make it plain to the reader that (as 
is plausible from the beginning) success is to be expected when n = p and 
p — 1 does not contain any prime but 2. This requires that p is a prime of 
the form 2 W + 1, and the only such primes are the Fermat primes, t 
Suppose then that n = 17. The corresponding equation is 

(5.8.2) 2Lzl = x 16 + x 15 + • • • + 1 = 0. 

x — 1 

We write 




= cos ka + i sin ka. 


so that the roots of (5.8.2) are 


(5.8.3) x = ei,€ 2 ,...,ei 6 . 

From these roots we form certain sums, known as periods, which are the 
roots of quadratic equations. 

The numbers 


3 m (0 ^ m ^ 15) 

are congruent (mod 17), in some order, to the numbers k = 1,2 , ... , 16,* 
as is shown by the table 

(5.8.4) m = 0,1,2, 3, 4, 5,6, 7,8, 9,10,11,12,13,14,15, 

(5.8.5) k = 1,3,9, 10, 13, 5, 15, 11, 16, 14, 8, 7, 4,12, 2, 6. 

We define x\ and X 2 by 

*1 = X] = + e 9 + €13 + 615 + €\(, + €8 + + *2, 

m even 

*2 = ^2 = e 3 + ^10 + ^5 + €\\ +614+67 + 612 + 665 

m odd 

t See § 2.5, Theorem 17. 

* In fact 3 is a ‘primitive root of 17’ in the sense which will be explained in § 6.8. 
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« 

II 

€ k = €\ +€\3 + *i 6 +€ 4 . 

m=0(mod4) 


y 2 = 

€ k = €9+ e 15 + *8 + €2, 

m=2(mod4) 


w 

11 

£ 

*k = €3 + 65 + €14 -b €12, 

m=l(mod4) 


* 

11 

M 

€k = eio + €n + 67 + 66, 

m=3(mod4) 



Since 

*k + £\i—k = 2 cos ka 

we have 

x\ = 2(cos a + cos 8a + cos 4a + cos 2a), 

X 2 = 2(cos 3a + cos 7a + cos 5a + cos 6a), 

y\ = 2(cosa + cos 4a), y 2 = 2(cos8a + cos 2a), 

>>3 = 2(cos 3a + cos 5a), y 4 = 2(cos 7a + cos 6a). 


We prove first that x\ and X 2 are the roots of a quadratic equation with 
rational coefficients. Since the roots of (5.8.2) are the numbers (5.8.3), we 
have 

8 16 

jci + X 2 = 2 ^ cos ka = 2 ^ 6* — — 1 . 

*=1 k = 1 


Again, 


x\X 2 = 4(cos a + cos 8a + cos 4a + cos 2a) 

x (cos 3a + cos 7a + cos 5a + cos 6a). 

If we multiply out the right-hand side and use the identity 
(5.8.6) 2 cos ma cos na = cos(m + n)a + cos (m — n)a, 

we obtain 


x\X 2 = 4(*i -I- x 2 ) = -4. 
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Hence x\ and xj are the roots of 

(5.8.7) x 2 +x-4 = 0. 

Also 

cosa + cos 2a > 2 cos = J2 > — cos 8a, cos 4a > 0. 

Hence jci > 0 and therefore 

(5.8.8) x\ > X 2 . 

We prove next that y \ , y 2 and y $ , y* are the roots of quadratic equations 
whose coefficients are rational in x\ and X 2 . We have 

y\ +y 2 =x u 

and, using (5.8.4) again, 

y\yi = 4(cosa + cos4a)(cos8a + cos 2a) 

8 

= 2 ^ cos ka = — 1 . 

*=i 


Hence y \ , y 2 are the roots of 

(5.8.9) y 2 -xiy- 1=0; 
and it is plain that 

(5.8.10) yi > y 2 . 

Similarly 

yi+y4=X2, jp3^ 4 = -l, 

and so y 3 ,y 4 are the roots of 

(5.8.11) y 2 — X 2 y —1=0, 
and 

(5.8.12) 


yi > y4- 
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2cosa + 2cos4a = yi, 

4 cos a cos 4 a = 2(cos 5a + cos 3a) = y$. 

Also cos a > cos 4a. Hence z\ — 2 cos a andz 2 = 2 cos 4a are the 
roots of the quadratic 

(5.8.13) z 2 — y\z+y3=0 
and 

(5.8.14) zi > Z 2 . 

We can now determine z\ = 2 cos a by solving the four quadratics 

(5.8.5) , (5.8.7), (5.8.9), and (5.8.11), and remembering the associated 
inequalities. We obtain 

2 cos a = |{-1 + J\l + V(34-2yi7)} 

+ + 12 yj\ 1 - 16^(34 + 2 7 ) 

- 2(1 -V17)V(34- 2^17)}, 

an expression involving only rationals and square roots. This number may 
now be constructed by the use of the ruler and compass only, and so a may 
be constructed. 

There is a simpler geometrical construction. Let C be the least positive 
acute angle such that tan 4C = 4, so that C, 2 C, and 4 C are all acute. Then 

(5.8.5) may be written 


x 2 + 4x cot 4C — 4 = 0. 

The roots of this equation are 2 tan 2 C, —2 cot 2 C. Since xj > X 2 , this gives 
x\ = 2tan2C and X 2 = — 2cot2C. Substituting in (5.8.7) and (5.8.9) and 
solving, we obtain 

y\ = tan (C + \n ) , yi = tan C, 

>>2 = tan (C — ^tt) , y 4 = -cotC. 
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Hence 

(5.8.15) 

1 2 cos 3a + 2 cos 5a = yi = tan C, 

2 cos 3a . 2 cos 5a = 2 cos 2a + 2 cos 8a = >>2 = tan(C — ^tt). 

Now let 04, OB (Fig. 5) be two perpendicular radii of a circle. Make 
OI one-fourth of OB and the angle OIE (with E in OA) one-fourth of the 
angle OIA. Find on AO produced a point F such that EIF = Let the 
circle on AF as diameter cut OB in K, and let the circle whose centre is E 
and radius EK cut OA in Nt, and Ns (N 3 on OA, Ns on AO produced). Draw 
N 3 P 3 , NsPs perpendicular to OA to cut the circumference of the original 
circle in P 3 and P$. 



Fig. 5. 

Then OIA = 4 C and OIE = C. Also 


2 cos AOP 3 + IcosAOPs = 


2 cos AOP 3 .2 cos AOPs = 


ON 3 -ONS 4 OE OE 

2 — OA— - -0l=0i=^ C ’ 

ON 3 - ONs OK 2 

OA 2 “ 4 042 


OF OF . 

= ~ 4 oa = -~oI = tan(C _ ^’ r) ' 


Comparing these equations with (5.8.13), we see that AOP 3 = 3a and 
AOPs = 5a. It follows that /l, P 3 , P 5 are the first, fourth, and sixth vertices 
of a regular polygon of 17 sides inscribed in the circle; and it is obvious 
how the polygon may be completed. 
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NOTES 

§5.1. The contents of this chapter are all ‘classical’ (except the properties of Ramanujan’s 
and Kloosterman’s sums proved in § 5.6), and will be found in text-books. The theory of 
congruences was first developed scientifically by Gauss, D.A. , though the main results must 
have been familiar to earlier mathematicians such as Fermat and Euler. We give occasional 
references, especially when some famous function or theorem is habitually associated with 
the name of a particular mathematician, but make no attempt to be systematic. 

§ 5.5. Euler, Novi Comm. Acad . Petrop . 8 (1760-1), 74-104 [Opera (1), ii. 531-44]. 

It might seem more natural to say that / (m) is multiplicative if 

f(mm) =f(m)f(m) 

for all m, m! . This definition would be too restrictive, and the less exacting definition of 
the text is much more useful. 

§ 5.6. The sums of this section occur in Gauss, ‘Summatio quarumdam serierum singu- 
larium’ (1808), Werke, ii. 11-45; Ramanujan, Trans. Camb. Phil. Soc. 22 (1918), 259-76 
{Collected Papers, 179-99); Kloosterman, Acta Math. 49 (1926), 407-64. ‘Ramanujan’s 
sum’ may be found in earlier writings; see, for example, Jensen, Beretning d. tredje Skand. 
Matematikercongres (1913), 145, and Landau, Handbuch , 572: but Ramanujan was the 
first mathematician to see its full importance and use it systematically. It is particularly 
important in the theory of the representation of numbers by sums of squares. For the 
evaluation of Gauss’s sums, their applications and their history, see Davenport, Multiplica- 
tive number theory , (Markham, Chicago, 1 967) and for information and references about 
Kloostermann’s sums, see Weil, Proc. Nat. Acad . Sci. U.S.A . 34 (1948), 204-7. 

§ 5.8. The general theory was developed by Gauss, DA., §§ 335-66. The first explicit 
geometrical construction of the 1 7-agon was made by Erchinger (see Gauss, Werke, ii. 
186-7). That in the text is due to Richmond, Quarterly Journal of Math. 26 (1893), 206-7, 
and Math. Annalen, 67 (1909), 459-61. Our figure is copied from Richmond’s. 

Gauss (DA., § 341) proved that the equation (5.8.1) is irreducible, i.e. that its left-hand 
side cannot be resolved into factors of lower degree with rational coefficients, when n is 
prime. Kronecker and Eisenstein proved, more generally, that the equation satisfied by 
the <p(n) primitive nth roots of unity is irreducible; see, for example, Mathews, Theory of 
numbers (Cambridge, Deighton Bell, 1892), 186-8. Grandjot has shown that the theorem 
can be deduced very simply from Dirichlet’s Theorem 15: see Landau, Vorlesungen, iii.219. 
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6.1. Fermat’s theorem. In this chapter we apply the general ideas of 
Ch. V to the proof of a series of classical theorems, due mainly to Fermat, 
Euler, Legendre, and Gauss. 

Theorem 70. Ifp is prime, then 

(6.1.1) a p = a (mod/?). 

Theorem 71 (Fermat’s theorem). Ifp is prime, and p\ a, then 

(6.1.2) cP~ [ = 1 (mod/?). 

The congruences (6.1.1) and (6. 1 .2) are equivalent when p \ a; and (6.1.1) 
is trivial when p\a, since then aP s 0 = a. Hence Theorems 70 and 71 are 
equivalent. 

Theorem 71 is a particular case of the more general 
Theorem 72 (The Fermat-Euler theorem). If {a, m) = 1, then 

a 4>(.m) s j (mod m). 

If x runs through a complete system of residues prime to m, then, by 
Theorem 58, ax also runs through such a system. Hence, taking the product 
of each set, we have 


]”[(**) = n* ( mod m ) 


or 


a 4>(m) ]~[ac = ]~[jc (modm). 

Since every number* is prime to m, their product is prime to m; and hence, 
by Theorem 55, 

a < P ( m ) _ j ( m0( j/n). 


The result is plainly false if (a, m) > 1. 
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6.2. Some properties of binomial coefficients. Euler was the first to 
publish a proof of Fermat’s theorem. The proof, which is easily extended 
so as to prove Theorem 72, depends on the simplest arithmetical properties 
of the binomial coefficients. 

Theorem 73. If m and n are positive integers, then the binomial 
coefficients 

/ m\ m (m - 1) . . . (m - n + 1) 

W “ »! 

/ —m\ _ „ m (m + 1) . . . (m + n - 1) 

l"/ n\ 

are integers. 

It is the first part of the theorem which we need here, but, since 

(- D ”( m+ „"- 1 

the two parts are equivalent. Either part may be stated in a more striking 
form, viz. 

Theorem 74. The product of any n successive positive integers is 
divisible by n\. 

The theorems are obvious from the genesis of the binomial coefficients 
as the coefficients of powers of x in (1 + x)(l +x). . . or in 

(1— x) 1(1— x) 1 ... = (1 + x + + •• *)(1 + x + + •••)•• •• 

We may prove them by induction as follows. We choose Theorem 74, which 
asserts that 




(m)„ = m{m 4- 1 ). . .(m + n — 1 ) 

is divisible by n\. This is plainly true for n = 1 and all m, and also for 
m = 1 and all n. We assume that it is true (a) for n = N — 1 and all m and 
(b) for n = N and m = M. Then 

C M + l) N -M n = N(M + 1 )*-!, 

and (M + 1)jv-i is divisible by (N — 1)!. Hence (M + \)n is divisible by 
N\, and the theorem is true for n = N and m = M + 1 . It follows that the 
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theorem is true for n = N and all m. Since it is also true for n = N + 1 and 
m = 1 , we can repeat the argument; and the theorem is true generally. 

Theorem 75. Ifp is prime, then 



are divisible by p. 

If 1 < n ^p — 1, then 

n\ I PiP ~ !)• ..(/> — »+ 1)> 
by Theorem 74. But n\ is prime to p, and therefore 

n\ | (p — 1 ){p — 2). ..(p — n+ 1). 


Hence 

(p\ (/>-1)0>-2)...(/>-h+1) 

uh 

is divisible by p. 

Theorem 76. If p is prime, then all the coefficients in (1 — x)~ p are 
divisible by p, except those of \, xP, x 2p , .... which are congruent to 1 
(mod p). 

By Theorem 73, the coefficients in 

n=\ V / 

are all integers. Since 

(1 -xP)- x = 1 +x p +x 2p + ..., 
we have to prove that every coefficient in the expansion of 

(1 -j dT 1 - (1 -x)~ p = (1 -x)" p (l -xP) _1 {(l -x)P -l +xP) 


is divisible by p. Since the coefficients in the expansions of (1 — x) p and 
(1 — xP) -1 are integers it is enough to prove that every coefficient in the 
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polynomial (1 — xY — 1 4- xf is divisible by p. For p = 2 this is trivial 
and, for p^ 3, it follows from Theorem 75 since 

(i -xY-i+xr =j2(-v r (£V- 

r=l ' ' 

We shall require this theorem in Ch. XIX. 

Theorem 77. Ifp is prime, then 

(x + y + • • • 4- wY = x p + y p H 4- w p (mod p). 

For 

(x + yY = X? + y p (mod/?), 

by Theorem 75, and the general result follows by repetition of the argument. 
Another useful corollary of Theorem 75 is 

Theorem 78. If a > 0 and 

m = 1 (mod//*), 

then 

rtf = 1 (mod//* +1 ). 

For m = 1 + kp a , where k is an integer, and ap ^ a + 1 . Hence 

nf = (1 +kp a Y = 1 +lp a+l , 

where / is an integer. 

6.3. A second proof of Theorem 72. We can now give Euler’s 
proof of Theorem 72. Suppose that m = T\p a . Then it is enough, after 
Theorem 53, to prove that 

a <t>(m) s | 

But 


<Hm) = Y\Hp a ) = n*" 1 ^ - 
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and so it is sufficient to prove that 


a? = 1 (mod/? ff ) 


when p \ a. 

By Theorem 77, 


(jc +y + . . y = x p +y p + . . .(mod p). 


Taking x = y = z — . . . = 1, and supposing that there are a numbers, we 
obtain 


cP = a (mod/?), 


or 

cP~ x = 1 (mod/?). 

Hence, by Theorem 78, 

qP(p-I) — j ( moc i p 2 ^ 5 a p2 ( /> - 1 ) = 1 (mod /? 3 ) , 

s i ( mo d/? a ) . 


6.4. Proof of Theorem 22. Before proceeding to the more important 
applications of Fermat’s theorem, we use it to prove Theorem 22 of Ch. II. 
We can write / («) in the form 

/(«) = jt, & (w) a r = 12 

r=l r= 1 \S=0 / 

where the a and c are integers and 

1 ^ a\ < a 2 < . . . < a m . 

The terms of f (n) are thus arranged in increasing order of magnitude for 
large n, and / ( n ) is dominated by its last term 

c m,q m n qm a n m 


for large n (so that the last c is positive). 
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If f ( n ) is prime for all large n, then there is an n for which 

fin) =p> a m 

and p is prime. Then 

[n+kpip — 1)} S s n s (mod p), 
for all integral k and s. Also, by Fermat’s theorem, 

c f~ x = 1 (mod/?) 

and so 

(fi+kptp-l) s a n (modp) 
for all positive integral k. Hence 

[n + kp(p — 1)}* a” + * p(p-1) = rfa" (mod/?) 

and therefore 

f [n+kpip ~ 1)} =/(«) = 0 (mod /?) 

for all positive integral k; a contradiction. 

6.5. Quadratic residues. Let us suppose that p is an odd prime, that 
p \ a, and that x is one of the numbers 

1,2,3,...,/?- 1. 

Then, by Theorem 58, just one of the numbers 

1 .x, 2 .x , . . . ,(/? — l)x 

is congruent to a (mod/?). There is therefore a unique x' such that 
xx' = a (mod /?), 0 < x' < p. 

We call x' the associate of x. There are then two possibilities: either there 
is at least one x associated with itself, so that x' = x, or there is no such x. 

(1) Suppose that the first alternative is the true one and that x\ is 
associated with itself. In this case the congruence 

x 2 = a (mod /?) 
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has the solution x = x\\ and we say that a is a quadratic residue of p, or 
(when there is no danger of a misunderstanding) simply a residue of/;, and 
write aRp. Plainly 

x = p — x i = —x\ (mod p) 

is another solution of the congruence. Also, if x' = x for any other value 
X 2 of x, we have 

x] = a, x\ s a, (*i - x 2 ) (*i + x 2 ) = xf - x\ = 0 (mod/;) . 
Hence either x 2 = x\ or 


x 2 = ~x\ =p-x\; 

and there are just two solutions of the congruence, namely x\ and p — x \ . 
In this case the numbers 


1 , 2 ,...,/;- 1 

may be grouped as x \ , p — x \ , and \{p— 3) pairs of unequal associated 
numbers. Now 


*1 (p~x\) = -x\ = -a (mod/;) , 


while 


xx = a (mod p) 


for any associated pair x,x'. Hence 


(p — 1)! = Y\ x = ~o.a^ p 3) = —a^ p ^ (mod/;). 


(2) If the second alternative is true and no x is associated with itself, we 
say that a is a quadratic non-residue of p, or simply a non-residue of p , 
and write a N/;. In this case the congruence 

x 2 = a (mod p) 


has no solution, and the numbers 


1 , 2 ,...,/;— 1 

may be arranged in \{p — 1) associated unequal pairs. Hence 
(p — 1)! = (mod/;) . 
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We define ‘Legendre’s symbol’ where p is an odd prime and a is any 

number not divisible by p, by 



if aRp, 
if aNp. 


It is plain that 



if a = b (mod p). We have then proved 

Theorem 79. If p is an odd prime and a is not a multiple of p, then 

( p — 1)! = — a^ p ~ X) (mod/?) . 

We have supposed p odd. It is plain that 0 = 0 2 , 1 = l 2 , and so all 
numbers, are quadratic residues of 2. We do not define Legendre’s symbol 
when p = 2, and we ignore this case in what follows. Some of our theorems 
are true (but trivial) when p = 2. 

6.6. Special cases of Theorem 79: Wilson’s theorem. The two 
simplest cases are those in which a = 1 and a = — 1. 

(1) First let a = 1. Then 


x 2 = 1 (mod p) 

has the solutions x = ±1; hence 1 is a quadratic residue of p and 



If we put a = 1 in Theorem 79, it becomes 
Theorem 80 (Wilson’s theorem): 


(p — 1)! = — 1 (mod p). 


Thus 11|3628801. 
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The congruence 

(p — 1)! + 1 = 0 (mod p 2 ) 

is true for 

p = 5, p = 13, p = 563, 

but for no other value of p less than 200000. Apparently no general theorem 
concerning the congruence is known. 

If m is composite, then 


m\(m — 1)! H- 1 

is false, for there is a number d such that 

d\m, 1 < d < m, 

and d does not divide (m — 1)!+1 . Hence we derive 

Theorem 81. If m > 1, then a necessary and sufficient condition that m 
should be prime is that 

m\(m — 1)! + 1. 

The theorem is of course quite useless as a practical test for the primality 
of a given number m. 

(2) Next suppose a = — 1. Then Theorems 79 and 80 show that 

= - (-l)2 (p_1) ( p - 1)! = (— 1)5 (/>-1) . 

Theorem 82. The number —1 is a quadratic residue of primes of the 
form 4 k + 1 and a non-residue of primes of the form 4k + 3, i.e. 

(y) = ( “ 1 )i0 ’~ ,) - 

More generally, combination of Theorems 79 and 80 gives 
Theorem 83: 


(modp) . 
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6.7. Elementary properties of quadratic residues and non-residues. 

The numbers 

(6.7.1) l 2 , 2 2 ,3 2 , ...,{£(/>- 1)| 2 

are all incongruent; for r 2 = s 2 implies r = s or r = — s (mod/?), and the 
second alternative is impossible here. Also 

r 2 == (p — r) 2 (mod p). 

It follows that there are \{p— 1) residues and \(p— 1) non-residues of p. 

Theorem 84. There are \{p — 1) residues and \{p — 1) non-residues 
of an odd prime p. 

We next prove 

Theorem 85. The product of two residues, or of two non-residues, is a 
residue, while the product of a residue and a non-residue is a non-residue. 

(1) Let us write a, a', c*i, . . . for residues and 0, 0', 0i, . . . for non- 
residues. Then every act' is an a, since 

x 2 = a -y 2 s= a' -> (xy) 2 = aa'(mod p). 

(2) If ai is a fixed residue, then 

l.ai,2.ai,3.ai ,...,(/?— l)ai 

is a complete system (mod p). Since every aa\ is a residue, every fia\ 
must be a non-residue. 

(3) Similarly, if is a fixed non-residue, every /3fi\ is a residue. For 

1-01,2.01 ,...,(/>— l)0i 

is a complete system (mod p), and every a0i is a non-residue, so that every 
001 is a residue. 

Theorem 85 is also a corollary of Theorem 83. 

We add two theorems which we shall use in Ch. XX. The first is little 
but a restatement of part of Theorem 82. 

Theorem 86. If p is a prime 4k -(- 1, then there is an x such that 

1 + x 2 = mp, 


where 0 < m < p. 



88 


FERMAT’S THEOREM AND ITS CONSEQUENCES 


[Chap. VI 


For, by Theorem 82, — 1 is a residue of p, and so congruent to one of the 
numbers (6.7.1), say x 2 \ and 

0 < 1 + x 2 < 1 + (jp) 2 < p 2 - 

Theorem 87. Ifp is an odd prime, then there are numbers x and y such 

that 

1 +x 2 +y 2 = mp, 

where 0 < m < p. 

The j(p+ 1) numbers 

(6.7.2) x 2 (0 < \{p- 1)) 

are incongruent, and so are the j(p + 1) numbers 

(6.7.3) -1 -y 2 (O^y ^ \(p- 1)) . 


But there are p + 1 numbers in the two sets together, and only p residues 
(mod/?); and therefore some number (6.7.2) must be congruent to some 
number (6.7.3). Hence there are an x and a y, each numerically less than 
jP, such that 

x 2 = — 1 — y 2 , \+x 2 +y 2 = mp. 


Also 

0 < 1 + x 2 +y 2 < 1 + 2 (jp) 2 < p 2 , 
so that 0 < m < p. 

Theorem 86 shows that we may take y = 0 when p = 4k + \. 


6.8. The order of a (mod m). We know, by Theorem 72, that 


a 0(m) _ j ( mo( J 


if (a, m) = 1. We denote by d the smallest positive value of x for which 
(6.8.1) a* = 1 (mod m). 


so that d ^ <t>(m). 

We call the congruence (6.8.1) the proposition P(x). Then it is obvious 
that P(x) and P(y) imply P( x + y). Also, if y < x and 


cf y = b (mod m). 
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then 

a x = ba y (mod m), 

so that P(x) and P(y) imply P(x — y). Hence P(x) satisfies the conditions 
of Theorem 69, and 

d \<f>(m). 

We call d the order t of a (mod m), and say that a belongs to d (mod m). 
Thus 

2 = 2, 2 2 = 4, 2 3 = 1 (mod 7), 

and so 2 belongs to 3 (mod 7). If d = <f>(m), we say that a is a primitive 
root of m. Thus 2 is a primitive root of 5, since 

2 = 2, 2 2 = 4, 2 3 = 3, 2 4 = 1 (mod 5); 

and 3 is a primitive root of 17. The notion of a primitive root of m bears 
some analogy to the algebraical notion, explained in § 5.6, of a primitive 
root of unity. We shall prove in § 7.5 that there are primitive roots of every 
odd prime p. 

We can sum up what we have proved in the form 

Theorem 88. Any number a prime to m belongs (mod m ) to a divisor of 
4>(m) : if d is the order of a (mod m), then d \ 4>(m). If m is a prime p, then 
d \(p — 1). The congruence a? = 1 (mod m) is true or false according as 
x is or is not a multiple of d. 

6.9. The converse of Fermat’s theorem. The direct converse of 
Fermat’s theorem is false; it is not true that, if m \ a and 

(6.9.1) d ti ~ x = 1 (mod m), 

then m is necessarily a prime. It is not even true that, if (6.9.1) is true for 
all a prime to m, then m is prime. Suppose, for example, that m = 561 = 
3. 11. 17. If 3 | a, 11 \a, 17 fa, w'e have 

a 2 = 1 (mod 3), a 10 = 1 (mod 11), a 16 = 1 (mod 17) 

by Theorem 71. But 2 | 560, 10 | 560, 16 1 560 and so a 560 = 1 to each of 
the moduli 3, 11, 17 and so to the modulus 3.11.17 = 561. 

If (6.9.1) is true for a particular a and a composite m, we say that m 
is a pseudo-prime with respect to a. If m is a pseudo-prime with respect 


t Often called the index; but this word has a quite different meaning in the theory of groups. 
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to every a such that (a, m) = 1, we call m a Carmichael number. It is 
not known whether there is an infinity of Carmichael numbers, t nor even 
whether there is an infinity of composite m such that 2 m = 2 and 3 m = 3 
(mod m). But we can prove. 

Theorem 89. There is an infinity of pseudo-primes with respect to every 
a > 1. 

Let p be any odd prime which does not divide a(a 2 — 1). We take 


(6.9.2) 


a 2p — 1 /aP-l\/aP+l\ 
m ~ a 2 - 1 ~ \ a — 1 / \ a + 1 )* 


so that m is clearly composite. Now 

(a 2 — 1 )(m — 1) = a 2p — a 2 = a(a p ~ l — \)(aP + a). 


Since a and aP are both odd or both even, 2| (aP + a). Again cP~ x — 1 is 
divisible by p (after Theorem 71) and by a 2 — 1, since p— 1 is even. Since 
p \ (a 2 — 1), this means that p(a 2 — l)|(a^ -1 — 1). Hence 

2p(a 2 — l)|(a 2 — l)(m — 1), 


so that 2p\ (m— 1 ) and m — 1 +2 pu for some integral u. Now, to modulus m, 
a 2p = 1 + m(a 2 - 1) = 1, a m ~ l = a 2pu = 1, 


and this is (6.9.1). Since we have a different value of m for every odd p 
which does not divide a(a 2 — 1), the theorem is proved. 

A correct converse of Theorem 71 is 

Theorem 90. If a m ~ x = 1 (mod m) and a* f 1 (mod m) for any divisor 
xofm — 1 less than m — 1, then m is prime. 

Clearly (a,m) = 1. If d is the order of a (mod m), then d\(m - 1) and 
d\</>(m) by Theorem 88. Since a d = 1, we must have d = m - 1 and so 
(m — 1)| <f>(m). But 


<p(m) = m 



< m — 1 


if m is composite, and therefore m must be prime. 


t 


This has now been settled, see the end of chapter notes. 
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6.10. Divisibility of 2 p ~ l - 1 by p 2 . By Fermat’s theorem 

2 p ~ l -1=0 (mod p) 

ifp > 2. Is it ever true that 

2 p ~ l — 1 = 0 (mod /? 2 )? 

This question is of importance in the theory of ‘Fermat’s last theorem’ (see 
Ch. XIII). The phenomenon does occur, but very rarely. 

Theorem 9 1 . There is a prime p for which 

2 p ~ l - 1 = 0 (mod p 2 ). 

In fact this is true when p = 1093, as can be shown by straightfor- 
ward calculation. We give a shorter proof, in which all congruences are to 
modulus p 1 = 1194649. 

In the first place, 

(6.10.1) 3 7 = 2187 = 2/?H-l, 3 14 = (2/? + l) 2 = 4/? + 1. . 

Next 


2 14 = 16384= 15/?- 11, 2 28 = -330/?+ 121, 

3 2 .2 28 = -2970 p + 1089 = -2969 p - 4 = -1876/? - 4, 


and so 

3 2 .2 26 s -469/?- 1. 

Hence, by the binomial theorem, 

3 14 .2 182 = -(469 p + l) 7 = -3283 p - 1 = -4p - 1 = -3 14 

by (6.10.1). It follows that 

2 182 = -1, 2 1092 = 1 (mod 1093 2 ). 

The same result is true for /? = 3511 but for no other p < 3 x 10 7 . 
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6.11. Gauss’s lemma and the quadratic character of 2. lip is an odd 

prime, there is just one residue^ of n (mod p) between —\p and \p. We 
call this residue the minimal residue of n (mod p)\ it is positive or negative 
according as the least non-negative residue of n lies between 0 and \p or 

between \p and p. 

We now suppose that m is an integer, positive or negative, not divisible 
by p, and consider the minimal residues of the \{p — 1) numbers 

(6.11.1) m, 2m, 3m , . . . , \{p — l)m. 

We can write these residues in the form. 


n,r 2 ,...,rx. 


j j 

r i> r 2> 



where 

\ + p = {(p- 1), 0 < r, < \p, 0 <r'i<\p. 

Since the numbers (6. 1 1 . 1 ) are incongruent, no two r can be equal, and no 
two /. If an r and an r' are equal, say r, = rj, let am, bm be the two of the 
numbers (6. 1 1 . 1 ) such that 

am = n, bm = —rj (mod p). 


Then 


am + bm = 0 (mod p), 


and so 

a + b = 0 (mod p), 

which is impossible because 0 < a < \p, 0 < b < \p. 

It follows that the numbers r,-, are a rearrangement of the numbers 


1 , 2 ,. 


\{p - 1); 


and therefore that 

m.2m ...\{p- 1 )m = (—1)^1 .2 . . . \(p - 1) (mod p), 

and so 

= (— l) M (mod p). 


t Here, of course, ‘residue’ has its usual meaning and is not an abbreviation of ‘quadratic residue’ 
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But 

— ^ /?), 

by Theorem 83. Hence we obtain 

Theorem 92 (Gauss’s lemma). = (— 1 ) M , where p, is the number of 
members of the set 


m, 2m, 3m, ..., j(/? — 1 )m. 


whose least positive residues (mod p) are greater than \p. 

Let us take in particular m = 2, so that the numbers (6.11.1) are 

2,4 ,...,p- 1. 

In this case A. is the number of positive even integers less than jp. 

We introduce here a notation which we shall use frequently later. We 
write [x] for the ‘integral part of x’, the largest integer which does not 
exceed x. Thus 

x = [x] +/, 

where 0 < 1. For example, 

[§] = 2 , []]= 0 , [-§] = - 2 . 

With this notation 

* = [b] 

But 

k + fJ. = j(/? — 1 ), 

and so 

M = \(p - 1) - [\p]. 

If/? = 1 (mod 4), then 

p=j(p-l)~ \(p- 1) = \(p- 1) = [*(/? + 1)], 
and if/? = 3 (mod 4), then 

n = - 1) - J (p - 3) = i(p + 1) = [£(/> + 1)] . 

E (-1)[’ < ' ,+1> ] (mod p), 


Hence 
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that is to say 0 ^ = 1, iip = %n + 1 or 8n — 1, 

= 1, \fp = 8/i + 3or8n — 3. 

lip = 8« ± 1, then |(/? 2 — 1) is even, while Up = 8« ± 3, it is odd. 
Hence 

(_1)[* ( * +1) ] = (_l)[s ( ^ +1) ]. 

Summing up, we have the following theorems. 

Theorem 93: 

(|) = 

Theorem 94: 

(a) = (_i)[s^-»]. 

Theorem 95. 2 is a quadratic residue of primes of the form 8/i ± 1 and 
a quadratic non-residue of primes of the form Sn ± 3. 

Gauss’s lemma may be used to determine the primes of which any given 
integer m is a quadratic residue. For example, let us take m — —3, and 
suppose that p > 3. The numbers (6. 1 1 . 1) are 

-3a (1 < a < \p), 

and n is the number of these numbers whose least positive residues lie 
between \p and p. Now 

—3a =p — 3a (mod p), 

and p — 3a lies between \p and /? if 1 ^ a < ^p. \i ^p < a < \p, then 
p — 3a lies between 0 and \p. If \p < a^p then 

—3a = 2p — 3a (mod p), 

and 2 p — 3 a lies between \p and p. Hence the values of a which satisfy the 
condition are 



1> 2, . . . , [g/?] , [j/?] + 1, [j/j] + 2, . . . , \\p \ , 
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* = [H + [?] ■ [H • 

lip — 6n 4- 1 then p = rt 4- 3n — 2n is even, and if p = 6n + 5 then 
p = n 4- (3n 4- 2) — (2n + 1) 


is odd. 

Theorem 96. —3 is a quadratic residue of primes of the form 6n + 1 and 
a quadratic non-residue of primes of the form 6 n 4- 5. 

A further example, which we leave for the moment^ to the reader, is 

Theorem 97. 1 is a quadratic residue of primes of the form lO/i ± 1 and 
a quadratic non-residue of primes of the form lOn ± 3. 

6.12. The law of reciprocity. The most famous theorem in this field is 
Gauss’s ‘law of reciprocity’. 

Theorem 98. If p and q are odd primes, then 

= (-i ? q , 

where 

p' = \(p- 0. q’ = \(q- i)- 

Since p'q f is even if either p or q is of the form An 4- 1, and odd if both 
are of the form An 4- 3, we can also state the theorem as 

Theorem 99. Ifp and q are odd primes, then 



unless both p and q are of the form An 4- 3, in which case 




We require a lemma. 

t See § 6.13 for a proof depending on Gauss’s law of reciprocity. 
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Theorem 100. t If 


S(q, p ) 



then 

S(q,p)+S(p, q) =p'q'. 

The proof may be stated in a geometrical form. In the figure (Fig. 6) AC 
and BC are x = p,y = q, and KM and LM are x = p',y = q'. 



Fig. 6. 


If (as in the figure) p > q, then q’ jp* < q/p, and M falls below the 
diagonal OC. Since 


q' < — < q' + 1 , 


there is no integer between KM = q' and KN = qp' Ip. 

We count up, in two different ways, the number of lattice points in the 
rectangle OKML, counting the points on KM and LM but not those on the 
axes. In the first place, this number is plainly p'q' . But there are no lattice 
points on OC (since p and q are prime), and none in the triangle PMN 
except perhaps on PM. Hence the number of lattice points in OKML is the 
sum of those in the triangles OKN and OLP (counting those on KN and 
LP but not those on the axes). 


t The notation has no connection with that of § S.6. 
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The number on ST, the line x = s, is \sq/p], since sq/p is the ordinate of 
T. Hence the number in OKN is 



S(q, p). 


Similarly, the number in OLP is S(p, q), and the conclusion follows. 
6.13. Proof of the law of reciprocity. We can write 


(6.13.1) 

where 


kq=p 



+ 


1 ^ k ^p', 1 < u k ^p - 1 . 


Here n* is the least positive residue of kq (mod p). If n* = v* ^ p' , then 
Uk is one of the minimal residues r, of § 6. 1 1, while if Uk = w* > p', then 
Uk —p is one of the minimal residues — rj. Thus 

n = n, rj=p-w k 
for every ij, and some k. 

The r, and rj are (as we saw in § 6. 1 1) the numbers 1,2,.. in some 
order. Hence, if 

R = = 5Z v *> R ' = 'H r j ~ £ (P ~~ = W ~ £ Wk 

(where p is, as in § 6. 1 1 , the number of the rj), we have 


and so 


p ’ 

R + R' = ^ v 

V=1 


1 p — 1 p + 1 _ p 2 — 1 

2 2 2 _ 8 


(6.13.2) W> + £>-£>* = 1(^-1). 

On the other hand, summing (6. 1 3. 1) from k = 1 to k = p' , we have 

(6.13.3) 


hip 2 ~ 1 ) =pS(q,p) + Yl Uk =p s (q>p) + + 
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From (6.13.2) and (6.13.3) we deduce 


(6.13.4) |(/> 2 - 1 )(q - 1) = pS(q, p) + 2 £ w k - pp. 


Now ^ — 1 is even, and p 2 — 1=0 (mod 8);^ so that the left-hand side 
of (6.13.4) is even, and also the second term on the right. Hence (since p 
is odd) 

S(q,p) = p (mod 2), 
and therefore, by Theorem 92, 



= (-1) M = (-l) S(q ' p) . 


— (_j ^S(q,p)+S(p,q) _ 


Finally, 


by Theorem 100. 

We now use the law of reciprocity to prove Theorem 97. If 



p = lOn + k, 

where k is 1, 3, 7, or 9, then (since 5 is of the form An + 1) 

©-(D-pr^Ms)- 

The residues of 5 are 1 and 4. Hence 5 is a residue of primes 5n + 1 and 
5n + 4, i.e. of primes lO/i -h 1 and lO/i + 9, and a non-residue of the other 
odd primes. 

6.14. Tests for primality. We now prove two theorems which provide 
tests for the primality of numbers of certain special forms. Both are closely 
related to Fermat’s Theorem. 

Theorem 101 . If p > 2, h < p,n = hp + 1 or hpl 2 + 1 and 
(6.14.1) 2 h ffel, 2 n ~ l = 1 (mod n), 

then n is prime. 

We write n = hp b + 1 , where b — 1 or 2, and suppose d to be the order 
of 2 (mod n). After Theorem 88, it follows from (6.14.1) that d \ h and 

e 

t Ifp = 2/i+l then/i 2 — 1 =4w(/i+l) = 0 (mod 8). 
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6 . 14 ( 102 )] - 

d\(n - 1), i.e. d\hp b . Hence p\d. But, by Theorem 88 again, d\<p(n) and so 
p\<t>(ff)- If 


n =p“' 


air 

• Pk > 


we have 

<f>(n) =p a \~' • • -pl k ~ l (pi - 1) • • • (Pk ~ !) 
and so, since p \ n,p divides at least one of p\ — 1, P 2 — 1, • • •, Pk — 1- 
Hence n has a prime factor P = 1 (mod p). 

Let n = Pm. Since n = 1 = P (mod p), we have m = 1 (mod/?). If 
m > 1, then 


(6.14.2) n = (up + l)(vp +1), 1 ^ u ^ v 


and 


hp b ~ x = uvp + u + v. 
If b = 1, this is h = uvp + u + v and so 

p ^ uvp < h <p. 


a contradiction. If b = 2, 

hp = uvp + « + v, /?|(u + v), u + v^p 


and so 

and 


2 v^u + v^p, v > —p 


, ^ „ ^P~ 2 2(p — 2) „ 

uv < h < p, uv ^ p — 2, u ^ < < 2. 


Hence u = 1 and so 


v>p- 1, uv^p-l. 


a contradiction. Hence (6.14.2) is impossible and m = 1 and n = P. 

Theorem 102. Let m ^ 2, h < 2 m am/ n = h2 m + \ be a quadratic non- 
residue (mod p) for some odd prime p. Then the necessary and sufficient 
condition for n to be a prime is that 

p?(n-l) s __j ( mo( j n y 


(6.14.3) 
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First let us suppose n prime. Since n = 1 (mod 4), we have 



by Theorem 99. Then (6.14.3) follows at once by Theorem 83. Hence the 
condition is necessary. 

Now let us suppose (6. 14.3) true. Let P be any prime factor of n and let 
d be the order of p (mod P). We have 

= - 1 , p»~ 1 = i, p p ~ ] = l (mod P) 

and so, by Theorem 88, 

d\\(n- 1), d\(n-l), d\{P-\), 

that is 

d f 2 m ~ l h, d\2 m h, d\(P-l), 

so that 2 m \ d and 2 m \(P - 1). Hence P = 2 m x+ 1. 

Since n = 1 = P (mod 2 m ), we have n/P = 1 (mod 2 m ) and so 

/i = (2 m x+ l)(2 m y + 1), x>l,y&0. 

Hence 

2 m xy < 2 m xy+x+y = h < 2 m , y = 0, 

and n = P. The condition is therefore sufficient. 

If we put h = 1, m = 2 k , we have n = Fk in the notation of § 2.4. 
Since l 2 = 2 2 = 1 (mod 3) and Fk = 2 (mod 3), Fk is a non-residue 
(mod 3). Hence a necessary and sufficient condition that Fk be prime is 

thatF*|(35 (F * _1) + 1). 

6.15. Factors of Mersenne numbers; a theorem of Euler. We return 
for the moment to the problem of Mersenne ’s numbers, mentioned in § 2.5. 
There is one simple criterion, due to Euler, for the factorability of M p = 
2 P — 1 . 

Theorem 103. Ifk > 1 and p = 4k + 3 is prime, then a necessary and 
sufficient condition that 2p + 1 should be prime is that 

(6.15.1) 2 p = 1 (mod 2p + 1). 

Thus, i/2p + \ is prime, (2 p + 1) | M p and M p is composite. 
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6 . 15 ] 


First let us suppose that 2/? H- 1 = P is prime. By Theorem 95, since 
P = 7 (mod 8), 2 is a quadratic residue (mod P) and 


2 P = 2i (P ~ l) = 1 (mod P ) 


by Theorem 83. The condition (6.15.1) is therefore necessary and P\M p . 
But k > 1 and so p > 3 and M p = 2^ — 1 > 2p + 1 = P. Hence M p is 
composite. 

Next, suppose that (6.15.1) is true. In Theorem 101, put h = 2, n = 
2 p 4- 1. Clearly h < p and 2 h = 4 ^ 1 (mod n ) and, by (6.15.1), 

2 n-\ = 2 2p =\ (mod n). 

Hence n is prime and the condition (6.15.1) is sufficient. 

Theorem 103 contains the simplest criterion known for the character of 
Mersenne numbers. The first eight cases in which this test gives a factor 
of M p are those for which 

p= 11, 23, 83, 131, 179, 191, 239, 251. 


NOTES 

§6.1. Fermat stated his theorem in 1 640 ( CEuvres , ii. 209). Euler’s first proof dates from 
1 736, and his generalization from 1 760. See Dickson, History , i, ch. iii, for foil information. 

§ 6.5. Legendre introduced ‘Legendre’s symbol’ in his Essai sur la theorie des nombres, 
first published in 1798. See, for example, § 135 of the second edition (1808). 

§ 6.6. Wilson’s theorem was first published by Waring, Meditationes algebraicae (1770), 
288. There is evidence that it was known long before to Leibniz. Goldberg ( Journ . London 
Math. Soc. 28 ( 1 953), 252-6) gives the residue of (p — 1 ) ! + 1 to modulus p 2 forp < 1 0000. 
See E. H. Pearson [Math. Computation 17 (1963), 194-5] for the statement about the 
congruence (mod p 2 ), By 2007, the computation had been extended to 5 x 10 8 without 
finding further examples. 

§ 6.7. We can use Theorem 85 to find an upper bound for q , the least positive quadratic 
non-residue (mod p). Let m = [p/q] + 1 , so that p < mq < p + q. Since 0 < mq — p < q, 
we see that mq — p must be a quadratic residue and so must mq. Hence m is a quadratic 
non-residue and so q < m. Hence q 2 < p+q and q < y/(p+ \ + \) Burgess (Mathematika 

4 (1957), 106-12) proved that q = 0(p a ) asp -> oo for any fixed a > \e~^l 2 . 

§ 6.9. Theorem 89 is due to Cipolla, Annali di Mat. (3), 9 (1903), 139-60. Amongst 
others the following are Carmichael numbers, viz. 3.11.17, 5.13.17, 5.17.29, 5.29.73, 
7. 1 3. 1 9. Apart from these, the pseudo-primes with respect to 2 which are less than 2000 are 


341 = 11.31, 645 = 3.5.43, 1387 = 19.73, 1905 = 3.5.127. 


See Dickson, History , i. 91-95, Lehmer, Amer. Math. Monthly , 43 (1936), 347-54, and 
Leveque, Reviews , 1, 47-53 for further references. 
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It has been shown by Alford, Granville, and Pomerance, (Ann. of Math. (2) 139 (1994), 
703-22) that there are in fact infinitely many Carmichael numbers. Indeed the numbers they 
construct are coprime to 6, yielding composite integers m for which 2 m = 2 and 3 m = 3 
(mod m). It had been shown in 1 899 by Korselt (L 'inermediaire des math. 6 (1 899), 142-3) 
that n is a Carmichael number if and only if n is square-free and p — 1 1 n — 1 for every prime 
p\n. 

Theorem 90 is due to Lucas, Amer. Journal of Math. 1(1 878), 302. It has been modified 
in various ways by D. H. Lehmer and others in order to obtain practicable tests for the 
prime or composite character of a given large m. See Lehmer, loc. cit., and Bulletin Amer. 
Math. Soc. 33 (1927), 327-40, and 34 (1928), 54-56, and Duparc, Simon Stevin 29 (1952), 
21-24. 

§ 6. 10. The proof is that of Landau, Vorlesungen, iii. 275, improved by R. F. Whitehead. 
Theorem 91 for p = 351 1 is due to Beeger. See also Pearson (loc. cit. above) and Froberg 
(Computers in Math. Research, (North Holland, 1968), 84-88) for the numerical statement 
at the end. It is now (2007) known that there are no further primes below 1 .25 x 10 1 ' with 
the property described. 

§§ 6.11-13. Theorem 95 was first proved by Euler. Theorem 98 was stated by Euler 
and Legendre, but the first satisfactory proofs were by Gauss. See Bachmann, Niedere 
Zahlentheorie, i, ch. 6, for the history of die subject, and many other proofs. 

§ 6.14. Miller and Wheeler took the known prime 2* 27 — 1 as p in Theorem 101 and 
found n = 190p 2 + 1 to satisfy the test. See our note to § 2.5. Theorem 101 is also true 
when n = Ap 3 -I- 1, provided that A < ,/p and that A is not a cube. See Wright, Math. 
Gazette, 37 (1953), 104-6. 

Robinson extended Theorem 102 (Amer. Math. Monthly, 64 (1957), 703-10) and he and 
Selfridge used the case p = 3 of the theorem to find a large number of primes of the form 
A. 2 m + 1 (Math, tables and other aids to computation, 1 1 (1957), 21-22). Amongst these 
primes are several factors of Fermat numbers. See also the note to § 15.5. 

Lucas [Theorie des nombres, i (1891), p. xii] stated the test for the primality of F*. 
Hurwitz [Math. Werke, ii. 747] gave a proof. F7 and Fjo were proved composite by this 
test, though actual factors were subsequendy found. 

The most important development in this area is undoubtedly the result of Agrawal, Kay al, 
and Saxena (Ann. of Math. (2) 160 (2004), 781-93), which gives a primality test, based 
ultimately on Fermat’s Theorem, which takes time of order (log n) c to test the number n. 
Here c is a numerical constant, which one can take to be 6 according to work of Lenstra 
and Pomerance. 

§ 6.15. Theorem 103; Euler, Comm. Acad. Petrop. 6 (1732-3), 103 [Opera (1), ii. 3]. 



VII 


GENERAL PROPERTIES OF CONGRUENCES 

7.1. Roots of congruences. An integer* which satisfies the congruence 

f (*) = co*” + ci*” -1 + . . . + c n = 0 (mod m) 

is said to be a root of the congruence or a root of f (*) (mod m). If a is 
such a root, then so is any number congruent to a (mod m). Congruent roots 
are considered equivalent; when we say that the congruence has / roots, 
we mean that it has / incongruent roots. 

An algebraic equation of degree n has (with appropriate conventions) just 
n roots, and a polynomial of degree n is the product of n linear factors. It is 
natural to inquire whether there are analogous theorems for congruences, 
and the consideration of a few examples shows at once that they cannot be 
so simple. Thus 

(7.1.1) — 1=0 (mod p) 

has p — 1 roots, viz. 


1 , 2, * • *>P 1 9 

by Theorem 71; 

(7.1.2) * 4 — 1 = 0 (mod 16) 
has 8 roots, viz. 1, 3, 5, 7, 9, 11, 13, 15; and 

(7.1.3) * 4 — 2 = 0 (mod 16) 

has no root. The possibilities are plainly much more complex than they are 
for an algebraic equation. 

7.2. Integral polynomials and identical congruences. If co, c \ , . . ., c„ 
are integers then 

co*” + ci*” -1 H 1- c„ 

is called an integral polynomial. If 

fix) = ^ c r *” -r , g(*) = 52 c' *” -r , 

r=0 r=0 
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and c r = c' r (mod m) for every r, then we say that fix) and g(x) are 
congruent to modulus m, and write 

fix) = gix) (mod m). 


Plainly 


fix) = gix) fix) hix) = gix)hix) 

if hix) is any integral polynomial. 

In what follows we shall use the symbol ‘s’ in two different senses, the 
sense of § 5.2, in which it expresses a relation between numbers, and the 
sense just defined, in which it expresses a relation between polynomials. 
There should be no confusion because, except in the phrase ‘the congruence 
fix) = O’, the variable x will occur only when the symbol is used in the 
second sense. When we assert that / (x) = g(x), or / (x) = 0, we are using 
it in this sense, and there is no reference to any numerical value of x. But 
when we make an assertion about ‘the roots of the congruence / (x) = O’, 
or discuss ‘the solution of the congruence’, it is naturally the first sense 
which we have in mind. 

In the next section we introduce a similar double use of the symbol ‘| ’. 

Theorem 104. (i) If p is prime and 

fix)gix) = 0 (mod p), 

then either /(x) = 0 or gix) = 0 (mod p). 

(ii) More generally, if 


fix)g{x) = 0 (mod p a ) 


and 


fix) # 0 (mod p ), 


then 


gix) = 0 (mod p a ). 

(i) We form f\ (x) from fix) by rejecting all terms of f (x) whose coef- 
ficients are divisible by p, and g\ (x) similarly. If fix) ^0 and g(x) fk 0, 
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then the first coefficients in f\ (jc) and gi (jc) are not divisible by p, and 
therefore the first coefficient in f\ (jc)gi (jc) is not divisible by p. Hence 

f(x)g(x) =fi(x)g\(x) ^ 0 (mod p). 

(ii) We may reject multiples of p from / (jc), and multiples of p a from 
g(x), and the result follows in the same way. This part of the theorem will 
be required in Ch. VIII. 

If / (jc) = g(jc), then / (a) = g(a) for all values of a. The converse is not 
true; thus 


cf = a (mod p) 


for all a, by Theorem 70, but 

xP = x (mod p) 


is false. 

7.3. Divisibility of polynomials (mod m). We say that/(jc) is divisible 
by g(x) to modulus m if there is an integral polynomial h(x) such that 

fix) = g(x)h(x) (mod m). 


We then write 


gix)\f{x) (mod m). 

Theorem 105. A necessary and sufficient condition that 

(x - a)\f(x) (mod m) 

is that 

f{a) = 0 (mod m). 

If 

(x - a)\f{x) (mod m), 

then 


fix) = (x — a)hix) (mod m) 
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for some integral polynomial hix), and so 

fia) = 0 (mod m). 

The condition is therefore necessary. 

It is also sufficient. If 


then 


fia) = 0 (mod m). 


fix) =f(x) -fid) (mod m). 

But 

/to = 

and 


fix) -fia) = ix - a)h(x), 

where 

hix) = f(X) ~ /(g) = Y" c r ix n - r ~ l + x n - r ~ 2 a + • • • + a" -r-1 ) 
x — a i — J 

is an integral polynomial. The degree of h(x) is one less than that of fix). 

7.4. Roots of congruences to a prime modulus. In what follows we 
suppose that the modulus m is prime; it is only in this case that there is a 
simple general theory. We write p for m. 

Theorem 106. If p is prime and 

fix) = gix)hix) (mod p), 

then any root of fix) (mod p) is a root either of g(jc) or of hix). 

If a is any root of fix) (mod p), then 

fia) = 0 (mod p), 


or 


gia)hia) = 0 (mod p). 
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Hence g(a ) = 0 (mod p) or h(a) = 0 (mod p), and so a is a root of g(x) or 
of h(x ) (mod p). 

The condition that the modulus is prime is essential. Thus 

x 2 = x 2 — 4 = (x — 2)(x + 2) (mod 4), 

and 4 is a root of x 2 = 0 (mod 4) but not of x — 2 = 0 (mod 4) or of 
* + 2 = 0 (mod 4). 

Theorem 107. If fix) is of degree n, and has more than n roots (mod p), 
then 


fix) = 0 (mod p). 

The theorem is significant only when n < p. It is true for n = 1, by 
Theorem 57; and we may therefore prove it by induction. 

We assume then that the theorem is true for a polynomial of degree less 
than n. If fix) is of degree n, and /(a) = 0 (mod p), then 

fix) = ix- a)gix) (mod p), 

by Theorem 105; and gix) is at most of degree n — 1. By Theorem 106, 
any root of fix) is either a or a root of gix). If fix) has more than n roots, 
then gix) must have more than n — 1 roots, and so 

gix) = 0 (mod p), 


from which it follows that 


fix) = 0 (mod p). 

The condition that the modulus is prime is again essential. Thus 

x 4 - 1 = 0(mod 16) 

has 8 roots. 

The argument proves also 

Theorem 108. Iff ix) has its full number of roots 

a\,a 2 , .. .,a n (mod p), 

then 


fix) = c 0 ix - a\)ix - a 2 ). . . ix - a„) (mod p). 
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7.5. Some applications of the general theorems. (1) Fermat’s theorem 
shows that the binomial congruence 

(7.5.1) x d = 1 (mod p) 

has its full number of roots when d — p — 1 . We can now prove that this 
is true when d is any divisor oip — 1. 

Theorem 109. If p is prime and d\p — 1, then the congruence (7.5.1) 
has d roots. 

We have 


- 1 = C X d - 1 )g(x), 


where 


g(x) — xf 1 d +x? 1 2d -i x d -+■ 1 . 

Nowx^ 1 — 1=0 has p — 1 roots, andg(x) = 0 has at most p — 1 — d. It 
follows, by Theorem 106, that x 4 * —1=0 has at least d roots, and therefore 
exactly d. 

Of the d roots of (7.5.1), some will belong to d in the sense of § 6.8, but 
others (for example 1) to smaller divisors ofp— 1. The number belonging 
to d is given by the next theorem. 

Theorem 110. Of the d roots of (7.5.1), <j>(d) belong to d. In particular, 
there are <f>(p — 1) primitive roots of p. 

If \fr(d) is the number of roots belonging to d, then 

^2 =P~ 1, 

d\p-\ 

since each of 1 , 2, . . ,,p — 1 belongs to some d; and also 

d\p-\ 

by Theorem 63. If we can show that \ f/(d) < <t>{d), it will follow that 
ir(d) = <p(d), for each d. 
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If rfr(d) > 0, then one at any rate of 1, 2, ...,/; — 1, say /, belongs to d. 
We consider the d numbers 


fh=f h 

Each of these numbers is a root of (7.5.1), since f d = 1 implies f 1 ** = 1. 
They are incongruent (mod p), since f h = f h , where h! < h < d, would 
imply /* = 1 , where 0<k = h — h'<d, and then / would not belong to 
d; and therefore, by Theorem 109, they are all the roots of (7.5.1). Finally, 
if fh belongs to d, then ( h , d) = 1 ; for k\h, k\d, and k > 1 would imply 

( [f h ) d/k = (f d ) h/k = 1, 


in which case fh would belong to a smaller index than d. Thus h must be one 
of the 4>(d) numbers less than and prime to d, and therefore rfr(d) ^ <j)(d). 
We have plainly proved incidentally 

Theorem 111. If p is an odd prime, then there are numbers g such that 
l,g,g 2 , . . -,g p ~ 2 are incongruent mod p. 

(2) The polynomial 


m=xf-' - 1 

is of degree p — 1 and, by Fermat’s theorem, has the/; — 1 roots 1, 2, 3, . . ., 
p — 1 (mod p). Applying Theorem 108, we obtain 

Theorem 112. If p is prime, then 

(7.5.2) — 1 = (x — l)(x — 2) . . . (x — p + 1) (mod p). 

If we compare the constant terms, we obtain a new proof of Wilson’s 
theorem. If we compare the coefficients of xP~ 2 , x?~ 2 , . . .,x, we obtain 

Theorem 113. If p is an odd prime, 1 ^ / < p — 1, and Ai is the sum of 
the products of l different members of the set 1,2,...,/;— 1, then Ai = 0 
(mod p). 

We can use Theorem 1 12 to prove Theorem 76. We suppose p odd. 

Suppose that 


n = rp — s (r ^ 1,0 ^ s < p). 
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Then 


p + n- l\ _ {rp-s+p- 1)! 
n )~ (rp- s)!(p- 1)! 

_ (/p-J+ l)(/p-s + 2)...(;p-,s+p- 1) 

(P-D! 


is an integer i, and 

(rp-s + l)(rp - s + 2). . .(rp - s + p - 1) = (p - l)!i = -i'(modp), 
by Wilson’s theorem (Theorem 80). But the left-hand side is congruent to 
(s - l)(s - 2). . .(s -p + 1) ■ y 7 " 1 - 1 (mod p), 
by Theorem 112, and is therefore congruent to — 1 when s = 0 and to 0 otherwise. 


7.6. Lagrange’s proof of Fermat’s and Wilson’s theorems. We based 
our proof of Theorem 112 on Fermat’s theorem and on Theorem 108. 
Lagrange, the discoverer of the theorem, proved it directly, and his 
argument contains another proof of Fermat’s theorem. 

We suppose p odd. Then 

(7.6.1) (x- l)(x-2 )...(x-p + 1) =x p ~ 1 -A\^~ 2 -F ... + A p -\, 

where A i , ... are defined as in Theorem 1 1 3. If we multiply both sides by 
x and change x into x — 1 , we have 


(x-iy -Ai(x-l)P- l +... +Ap- i(x- 1) = (x- l)(x-2)...(x-p) 

= (x—p)(x p l —A \xP 2 + . . . + Ap- 1 ). 

Equating coefficients, we obtain 



+ A\—p+A\, ^ i =pAl 

^ 3 ^ + ^ 2 1 + 1 2 ^2 +A$ =pAi +A 3 , 
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and so on. The first equation is an identity; the others yield in succession 

MVH 

( p — 1 = 1 + A\ 4- A 2 + . . . -+• Ap— 2 . 

Hence we deduce successively 

(7.6.2) p\A\, p\A2 , .... p\Ap- 2 > 

and finally 

(p - 1 M/,-1 = 1 (mod p) 
or 

(7.6.3) A p - 1 = —1 (mod p). 

Since A p -\ = {p — 1)!, (7.6.3) is Wilson’s theorem; and (7.6.2) and 

(7.6.3) together give Theorem 1 12. Finally, since 

( x — l)(x — 2). . .(x — p + 1) = O(mod p) 

for any x which is not a multiple of p, Fermat’s theorem follows as a 
corollary. 

7.7. The residue of {\(p — 1)}!. Suppose that p is an odd prime and 

™ = \{p- 1). 

From 

(p - 1)! = 1.2 . . . 5O? - 1) [p - j(p - 1)J [p - \(p - 3)J . . . (/? - 1) 
= (— l) aT (nr!) 2 (mod p) 

it follows, by Wilson’s theorem, that 

(nr!) 2 = (— l)" 7-1 (mod p). 
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We must now distinguish the two cases p = An+l and p = An+3. 
Up = An + 1, then 


(nr!) 1 2 = — 1 (mod p), 

so that (as we proved otherwise in § 6.6) — 1 is a quadratic residue of p. In 
this case m ! is congruent to one or other of the roots of x 2 = — 1 (mod p). 
lip = An + 3, then 

(7.7.1) (nr!) 2 = 1 (mod p), 

(7.7.2) m\ = ±1 (mod p). 


Since — 1 is a non-residue of p, the sign in (7.7.2) is positive or negative 
according as nr! is a residue or non-residue of p. But nr! is the product of 
the positive integers less than ^p, and therefore, by Theorem 85, the sign 
in (7.7.2) is positive or negative according as the number of non-residues 
of p less than ^p is even or odd. 

Theorem 114. If p is a prime An + 3, then 

[\{p- 1)}! = (-l) v (mod/>), 


where v is the number of quadratic non-residues less than \p 

7.8. A theorem of Wolstenholme. It follows from Theorem 113 that 
the numerator of the fraction 


, 1 1 
1 + ~ + ~ + 
2 3 


+ 


1 

P~ 1 


is divisible by p; in fact the numerator is the A p - 2 of that theorem. We can, 
however, go farther. 

Theorem 115. If p is a prime greater than 3, then the numerator of the 
fraction 


(7.8.1) 


1 1 

1+2+3+ 


+ 


1 

P- 1 


is divisible by jp-. 

The result is false when p = 3. It is irrelevant whether the fraction is or 
is not reduced to its lowest terms, since in any case the denominator cannot 
be divisible by p. 
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The theorem may be stated in a different form. If i is prime to m, the 
congruence 


ix = 1 (mod m) 

has just one root, which we call the associate of i (mod m)J We may denote 
this associate by f, but it is often convenient, when it is plain that we are 
concerned with an integer, to use the notation 

\_ 

i 

(or 1//). More generally we may, in similar circumstances, use 

b 

a 

(or b/a) for the solution of ax = b. 

We may then (as we shall see in a moment) state Wolstenholme’s theorem 
in the form 

Theorem 116. If p > 3, and \/i is the associate of i (mod p 2 ), then 

, 11 1 .7 

1 + - + - -I + j- = 0 (mod p 2 ). 

We may elucidate the notation by proving first that 

(7.8.2) 1 4- — 4- — 4- • • • H — = 0 (mod p ).^ 

2 3 p — 1 

For this, we have only to observe that, if 0 < / < /?, then 
i.- = 1, (p — /) : = 1 (mod p). 

i p-i 

Hence 

(\ 1 \ 1 1 
* ( t H : ) = i ~ ~ (p ~ i ) : = 0 (mod p), 

\i p-i ) i p - 1 

t H — : s 0 (mod p), 

i p-i 

and the result follows by summation. 

t As in § 6.5, the a of § 6.5 being now 1 . 

* Here, naturally, 1 /« is the associate of i (mod p). This is determinate (mod p), but indeterminate 
(mod /j 2 ) to the extent of an arbitrary multiple of p. 
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We show next that the two forms of Wolstenholme’s theorem (Theo- 
rems 1 15 and 116) are equivalent. If 0 < x < p and x is the associate of x 
(mod p 1 ), then 


x(p — 1 )! —xx 


= <p — = (mod p \ 


Hence 

* 

(p- 1 )!(I H- 2 H 

= (p - 1 )! ^1 -I- i + • • • + ( mod P 2 ). 

the fractions on the right having their common interpretation; and the 
equivalence follows. 

To prove the theorem itself we put x = p in the identity (7.6.1). This 
gives 


(p - 1 )! = pP~ x - A x pP~ 2 + . . . - A p - 2 p + A p -x. 
But A p - x = (p — 1)!, and therefore 

p p 2 — A xp p ^ + . . . 4- A p -3p — Ap~ 2 = 0. 
Since p > 3 and 


p\Ax,p\Ai, ...,p\A p - 3 , 
by Theorem 113, it follows that p 2 jA p - 2 , i.e. 

This is equivalent to Wolstenholme’s theorem. 
The numerator of 


C p = 1 + + . . . + 


1 


( P-IY 


♦ 0 

is A p _ 2 — 2Ap-xA p -3, and is therefore divisible by p. Hence 
Theorem 117. If p > 3, then C p = 0 (mod p). 
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7.9. The theorem of von Staudt. We conclude this chapter by proving 
a famous theorem of von Staudt concerning Bernoulli’s numbers. 

Bernoulli’s numbers are usually defined as the coefficients in the 
expansion* 


x 

e x - 1 




-X 4- — X 2 — x* 4- 

2 + 2! 4! 



We shall find it convenient to write 


x 

e x - 1 


Pi fo , Pi % 

4- r -4- x~ 4- XT + 

1! 2! 3! 


so that fio = 1» Pi = — \ and 


to = (-D*"‘s t , to+i=o (is* l). 


The importance of the numbers comes primarily from their occurrence in 
the ‘Euler-Maclaurin sum-formula’ for In fact 

(7.9.1) 1* + 2* + ... + („- = 

for k ^ 1. For the lefr-hand side is the coefficient of x* +1 in 
Arlx(l + e* + ^ + . . . + e {n ~ 1)x ) 

1 _ e™ x 

= k\x— = k\— rCe™ - 1) 


1 — e* 


e x - 1 

’ : x 2 + 


= k ' ( 1 + n j:+ l Jc2+ ' )(“^ + ' ) ; 

and (7.9.1) follows by picking out the coefficient in this product. 
Von Staudt’s theorem determines the fractional part of 5*. 


Theorem 118. Ifk ^ 1, then 
(7.9.2) 

the summation being extended over the primes p such that (p — l)|2fc. 


(-l)*fl* = y'i(mod 1), 
^ P 


t This expansion is convergent whenever |jc| < 2n. 



116 GENERAL PROPERTIES OF CONGRUENCES [Chap. VII 

For example, if k = 1, then (p — 1)|2, which is true ifp = 2 orp = 3. 
Hence -B\ = ^ and in fact B\ = \ When we restate (7.9.2) in 

terms of the ft, it becomes 

(7.9.3) + ~ = i ' 

ip- m p 

where 

(7.9.4) k= 1,2,4, 6,... 
and / is an integer. If we define e k (p) by 

*k(p) = 1 ((/>-l)|£), <*(/>) = 0 ((/>- l)ffc), 

then (7.9.3) takes the form 

(7.9.5) A + £— -*• 

where now runs through all primes. 

In particular von Staudt’s theorem shows that there is no squared factor 
in the denominator of any Bemoullian number. 

7.10. Proof of von Staudt’s theorem. The proof of Theorem 118 
depends upon the following lemma. 

Theorem 119: 

p - 1 

^2m k = -e k (p) (mod p). 
l 

If (p — 1)|£, then m k = 1, by Fermat’s theorem, and 

^m k =p- 1 = -1 = -€ k (p) (mod p). 

If (p — 1) \ k, and g is a primitive root of p, then 

g k #1 (mod p). 


(7.10.1) 



GENERAL PROPERTIES OF CONGRUENCES 


117 


7.10] 

by Theorem 88. The sets g, 2 g,. . (p— l)g and 1, 2,. 
(mod p), and therefore 


, p — 1 are equivalent 


( mg) k = (mod 

(g k ~ 1) mk — 0 ( mod P)» 


and 


= 0 = -€ k (p) (mod p), 

by (7.10.1). Thus £ m k = —€ k (p) in any case. 

We now prove Theorem 1 1 8 by induction, assuming that it is true for any 
number / of the sequence (7.9.4) less than k, and deducing that it is true for 
k. In what follows k and / belong to (7.9.4), r runs from 0iok,(io = 1, and 
fo = f} 5 = . . . = 0. We have already verified the theorem when k = 2, 
and we may suppose k > 2. 

It follows from (7.9.1) and Theorem 119 that, if nr is any prime, 
g*(ar) + yi ^ | | m k+x ~ r p r = 0(mod m) 


or 

(7.10.2) 

A ~ + £ j^r r (^) - 0(mod 1); 

there is no term in fi k -\, since f} k -\ = 0. We consider whether the 
denominator of 

u ^ = kTT- 


can be divisible by nr. 

If r is not an /, is 1 or 0. If r is an /, then, by the inductive hypothesis, the 

denominator of/J r has no squared factor, 1 " and that of mfi r is not divisible by 

t It will be observed that we do not need die full force of the inductive hypothesis. 
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nr. The factor is integral. Hence the denominator of Uk, r is divisible 
by nr only if that of 


m k-l-r m s - 1 


k + 1 — r 5+1 
is divisible by nr. In this case 


5 + 1 ^ m s . 

But 5 = k — r ^ 2, and therefore 

5+1 < 2 s ^ m s . 

a contradiction. It follows that the denominator of u*,r is not divisible 
by nr. 

Hence 


where nr \ 6*; and 


Pk + 


**(ar) 

nr 



Chip) 

P 


(P £ or) 


is obviously of the same form. It follows that 


(7.10.3) 


Pk + Y. 


e_k(p) 

P 


Ak 

Bk 


where Bk is not divisible by nr. Since nr is an arbitrary prime, Bk must be 
1. Hence the right-hand side of (7.10.3) is an integer; and this proves the 
theorem. 

Suppose in particular that k is a prime of the form 3n+ 1 . Then (p— 1) |2 k 
only ifpis one of2, 3,fc+l, 2&+1. But&+1 is even, and 2k + 1 = 6n+3 
is divisible by 3, so that 2 and 3 are the only permissible values of p. Hence 

Theorem 120: Ifk is a prime of the form 3n + 1, then 

Bk = \ (mod 1). 


The argument can be developed to prove that if k is given, there are an 
infinity of / for which B\ has the same fractional part as /?*; but for this we 
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need Dirichlet’s Theorem 1 5 (or the special case of the theorem in which 

b = \y 


NOTES 

§§ 7.2-4. For the most part we follow Hecke, § 3. 

§ 7.6. Lagrange, Nouveaux memoires de l ’Academie royale de Berlin , 2 (1773), 125 
( CEuvres , iii. 425). This was the first published proof of Wilson’s theorem. 

§ 7.7. Dirichlet, Journal fur Math. 3 (1828), 407-8 ( Werke , i. 107-8). 

§ 7.8. Wolstenholme, Quarterly Journal of Math . 5 (1862), 35-39. There are many 
generalizations of Theorem 115, some of which are also generalizations of Theorem 113. 
See §8.7. 

The theorem has generally been described as ‘ Wolstenholme ’s theorem’, and we follow 
the usual practice. But N. Rama Rao [Bull Calcutta Math . Soc. 29 (1938), 167-70] has 
pointed out that it, and a good many of its extensions, had been anticipated by Waring, 
Meditationes algebraicae , ed. 2 (1782), 383. 

§§ 7.9-10. von Staudt, Journal fur Math. 21 (1840), 372-4. The theorem was discovered 
independently by Clausen, Astronomische Nachrichten , 17 (1840), 352. We follow a proof 
by R. Rado, Journal London Math. Soc . 9 (1934), 85-8. 

Many authors use the notation 


x 

e*- 1 


00 V* 

- 

/i=0 


so that their B n is our p n . 

Theorem 120, and the more general theorem referred to in connexion with it, are due to 
Rado (ibid. 88-90). Indeed Erdtfs and WagstafF (Illinois J. Math . 24 (1980), 104-12) have 
shown, for given k , that one has B m = B^ (mod 1 ) for a positive proportion of values of m. 
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CONGRUENCES TO COMPOSITE MODULI 


8.1. Linear congruences. We have supposed since § 7.4 (apart from a 
momentary digression in § 7.8) that the modulus m is prime. In this chapter 
we prove a few theorems concerning congruences to general moduli. The 
theory is much less simple when the modulus is composite, and we shall 
not attempt any systematic discussion. 

We considered the general linear congruence 

(8.1.1) ax = b (mod m) 

in § 5.4, and it will be convenient to recall our results. The congruence is 
insoluble unless 


(8.1.2) d = (a,m)\b. 

If this condition is satisfied, then (8.1.1) has just d solutions, viz. 

£,£ H — t>£ + 2— ,...,£ + (d — 1)— , 
d d d 

where £ is the unique solution of 

5*-S(-5)- 

We consider next a system 


(8.1.3) a\x = b\ (mod mi), ajx = bi (mod m 2 ), 

. . . , akx = bk (mod m*). 

of linear congruences to coprime moduli mi , m 2 , ... , m*. The system will 
be insoluble unless (a„ m,)|bj for every /. If this condition is satisfied, we 
can solve each congruence separately, and the problem is reduced to that 
of the solution of the system 


(8.1.4) x ee c\ (mod mi), x = cj (mod m 2 ), . . . ,x = ck (mod m*). 

The m, here are not the same as in (8.1.3); in fact the m, of (8.1.4) is 
m,7(a„ m,) in the notation of (8. 1 .3). 
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We write 


m = m\ m2 ■ ■ .mk — m\ M\ = /W2A/2 — • ■ ■ — m k Afj t. 


Since (/w„ A/,-) = 1, there is an #*,- (unique to modulus m,) such that 


riiMi = 1 (mod m,). 


If 

(8.1.5) x = /ilAfici + / 12 A/ 2 Q H h rikMkCk , 

then x = riiMiCi = Ci (mod m,) for every /, so that x satisfies (8.1.4). 
If y satisfies (8.1.4), then 


y = a =x (mod m,) 

for every i, and therefore (since the m, are coprime), y = x (mod m ) . Hence 
the solution x is unique (mod m). 

Theorem 121. If m\, m 2 , are coprime, then the system (8.1.4) 

has a unique solution (mod m) given by (8.1.5). 

The problem is more complicated when the moduli are not coprime. We content ourselves 
with an illustration. 

Six professors begin courses of lectures on Monday, Tuesday, Wednesday, Thursday, 
Friday, and Saturday, and announce their intentions of lecturing at intervals of two, three, 
four, one, six, and five days respectively. The regulations of the university forbid Sunday 
lectures (so that a Sunday lecture must be omitted). When first will all six professors find 
themselves compelled to omit a lecture? 

If the day in question is the xth (counting from and including the first Monday), then 


x = 1 + 2k\ = 2 + 3*2 = 3 + 4*3 = 4 + £4 
= 5 + 6k$ = 6 + 5 kf, = Ikq, 

where the k are integers; i.e. 

( 1 ) x = 1 (mod 2), ( 2 ) x = 2 (mod 3), (3) x = 3 (mod 4), 

(4) x = 4 (mod 1), (5) x = 5 (mod 6), (6) x = 6 (mod 5), 

(7) x = 0(mod7). 

Of these congruences, (4) is no restriction, and ( 1 ) and (2) are included in (3) and (S). Of the 
two latter, (3) shows that x is congruent to 3, 7, or 1 1 (mod 12), and (S) that x is congruent 
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to 5 or 11, so that (3) and (5) together are equivalent to x = 11 (mod 12). Hence the problem 
is that of solving 


x = 1 1 (mod 12), x = 6 (mod 5), x = 0 (mod 7) 


or 

x = — 1 (mod 12), x = 1 (mod 5), x = 0 (mod 7). 

This is a case of the problem solved by Theorem 121. Here 

mj = 12, m 2 = 5, m 3 = 7 , m = 420, 

A/j = 35, M 2 = 84, M 3 = 60. 

The n are given by 

35/ii = 1 (mod 12), 84n 2 = 1 (mod 5), 6O/13 = 1 (mod 7), 

or 

—n\ = 1 (mod 12), — w 2 = 1 (mod 5), 4 n 3 — 1 (mod 7); 
and we can take n\ = — 1 , « 2 = — 1 , « 3 = 2. Hence 

x = (— 1)(— 1)35 + (-1)1.84 + 2.0.60 = -49 s 371 (mod 420). 

The first x satisfying the condition is 37 1 . 

8.2. Congruences of higher degree. We can now reduce the solution 
of the general congruence^ 

(8.2.1) f(x) = 0 (mod m), 

where / (x) is any integral polynomial, to that of a number of congruences 
whose moduli are powers of primes. 

Suppose that 


m = m\m2 . . . m^, 

no two m, having a common factor. Every solution of (8.2.1) satisfies 
(8.2.2) f{x) =0 (mod mi) (/ = 1,2, ...,&). 


t See §7.2. 
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If ci , C 2 , • • • , Ck is a set of solutions of (8.2.2), and x is the solution of 

(8.2.3) x = Ci (mod mi) (i = 1,2 k ), 

given by Theorem 121 , then 

fix) =f(ci) = 0 (mod mi) 

and therefore fix) = 0 (mod m). Thus every set of solutions of (8.2.2) 
gives a solution of (8.2.1), and conversely. In particular 

Theorem 122. The number of roots of (8.2.1) is the product of the 
numbers of roots of the separate congruences (8.2.2). 

If m = p a \p a f • • -Pk ' we ma y take m i = pT- 

8.3. Congruences to a prime-power modulus. We have now to 
consider the congruence 

(8.3.1) fix) = 0 (mod p a ) 

where p is prime and a > 1 . 

Suppose first that jc is a root of (8.3.1) for which 

(8.3.2) O^x <p a . 

Then x satisfies 

(8.3.3) fix) m 0 (mod p a_1 ), 
and is of the form 

(8.3.4) | -1- sp a_1 (0 < j < p), 
where £ is a root of (8.3.3) for which 

(8.3.5) 0 ^$<p a ~K 
Next, if $ is a root of (8.3.3) satisfying (8.3.5), then 

/($ + sf?- 1 )=/«)+ sp-'f'a) + + ••• 

-/(?) + ^-'/'(IHmodp"), 
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since 2a — 2 ^ a, 3a — 3 ^ a, ... , and the coefficients in 

/ w «) 

k\ 

are integers. We have now to distinguish two cases. 

(1) Suppose that 

(8.3.6) /'(f) #0(mod/>). 

Then %+sp a ~ ] is a root of (8.3.1) if and only if 

/(f) + sp a ~V^) = 0 (mod p a ) 


or 


■s/'(f) = — -r-r(mod />), 

pa-i 

and there is just one s (mod p) satisfying this condition. Hence the number 
of roots of (8.3.3) is the same as the number of roots of (8.3.1). 

(2) Suppose that 

(8.3.7) /'(f) = 0(mod/>). 

Then 


f(%+sp a ! ) =/(£) (mod p a ). 

If /(f) #0 (mod pP), then (8.3.1) is insoluble. If /(£) = 0 (mod p a ), 
then (8.3.4) is a solution of (8.3.1) for every s, and there are p solutions of 
(8.3.1) corresponding to every solution of (8.3.3). 

Theorem 123. The number of solutions of (8.3.1) corresponding to a 
solution f of (8.3.3) is 

(a) none, if /'(f) = 0 (mod p) and f is not a solution of (8.3.1); 

(b) one, if /'(f) # 0 (mod p)\ 

( c ) p, i//'(f ) = 0 (mod p) and f is a solution of (8.3.1). 

The solutions of (8.3.1) corresponding to f may be derived from f, in 
case (b) by the solution of a linear congruence, in case (c) by adding any 
multiple of to f . 
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8.4. Examples. (1) The congruence 

/ (x) = x?~ x — 1=0 (mod p) 

has the p— 1 roots 1 , 2 ,...,/?— 1 ; and if $ is any one of these, then 
/'(£) = (P~ m p ~ 2 ¥* 0 (mod p). 

Hence fix) = 0 (mod p 2 ) has just p — 1 roots. Repeating the argument, 
we obtain 

Theorem 124. The congruence 

xP~ x — 1=0 (mod p a ) 

has just p — 1 roots for every a. 

(2) We consider next the congruence 

(8.4.1) f(x ) ) = x? p(p ~ X) -1=0 (mod p 2 ), 

where p is an odd prime. Here 

/'(?) = &>(P - = 0 (mod p) 

for every £. Hence there are p roots of (8.4.1) corresponding to every root 
of fix) = 0 (mod p). 

Now, by Theorem 83, 

^(p- 1 ) = (mod p) 

according as x is a quadratic residue or non-residue of p, and 

x \ p(p-i) = (mod p) 

in the same cases. Hence there are \ip — 1) roots of fix) = 0 (mod p), 
and 5 pip — 1) of (8.4.1). 

We define the quadratic residues and non-residues of p 2 as we defined 
those of/? in § 6.5. We consider only numbers prime to /?. We say that x is 
a residue of p 2 if (i) (x,/?) = 1 and (ii) there is ay for which 

y 2 = x (mod p 2 ), 

and a non-residue if (i) (x,/?) = 1 and (ii) there is no suchy. 
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If x is a quadratic residue of/? 2 , then, by Theorem 72, 

x fr(p-\) =yP(P~ i) = i ( m od /? 2 ), 

so that x is one of the \pip — 1) roots of (8.4.1). On the other hand, if 
y\ and >>2 are two of the /?(/?— 1) numbers less than and prime to /? 2 , and 
y 2 = y%, then either y >2 =/? 2 —y\ ory\ — yi and y\ + yj are both divisible 
by /?, which is impossible because yi and yi are not divisible by p. Hence 
the numbers y 2 give just \pip — 1) incongruent residues (mod pi 1 ), and 
there are \pip — 1) quadratic residues of p 2 , namely the roots of (8.4.1). 

Theorem 125. There are \p(p — 1) quadratic residues of p 2 , and these 
residues are the roots of (8.4.1). 

(3) We consider finally the congruence 
(8.4.2) f(x) = x 2 - c m 0 (mod p a ), 

where p \ c. If p is odd, then 

/'(£) = 2£#0(mod/?) 

for any £ not divisible by p. Hence the number of roots of (8.4.2) is the 
same as that of the similar congruences to moduli p a ~ l ,p a ~ 2 , ...,/?; that 
is to say, two or none, according as c is or is not a quadratic residue of p. 
We could use this argument as a substitute for the last paragraph of (2). 

The situation is a little more complex when p = 2, since then /'(£) = 0 
(mod p) for every £. We leave it to the reader to show that there are two 
roots or none when a = 2 and four or none when a ^ 3. 

8.5. Bauer’s identical congruence. We denote by t one of the 4>{m) 
numbers less than and prime to m, by t(m) the set of such numbers, and by 

(8.5.1) f„(x) = n C* - 0 

t(m) 

a product extended over all the t of t(m). Lagrange’s Theorem 112 states 
that 

(8.5.2) f m (x) = - 1 (mod m) 

when m is prime. Since 


x <Hm) _ i = o (mod m) 
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has always the 4>(m) roots t, we might expect (8.5.2) to be true for all m; 
but this is false. Thus, when m = 9, t has the 6 values ± 1 , ±2, ±4 (mod 9), 
and 

f m (x) = (jc 2 - 1 2 )(jc 2 - 2 2 )(x 2 - 4 2 ) = x 6 - 3x 4 + 3X 2 - 1 (mod 9). 

The correct generalization was found comparatively recently by Bauer, 
and is contained in the two theorems which follow. 

Theorem 126. If p is an odd prime divisor of m, and p a is the highest 
power of p which divides m, then 

(8.5.3) /„(*) = n (x - () = (x?-' - (mod p°). 

t(m) 


In particular 

(8.5.4) frW = ["[ (x - t) = (x?~ x - if 1 - 1 (mod p a ). 

t(P°) 

Theorem 127. If m is even, m > 2, and 2 a is the highest power of 2 
which divides m, then 

(8.5.5) f m (x) = (x 2 - 1)!*<“> (mod 2“). 

In particular 

(8.5.6) f 2 a(x) = (x 2 - l) 2 "" 2 (mod 2 a ). 
when a > 1 . 

In the trivial case m = 2 ,/ 2 (x) = x - 1. This falls under (8.5.3) and not under (8.5.5). 
We suppose first that p > 2, and begin by proving (8.5.4). This is true 
when a = 1 . If a > 1, the numbers in t(p a ) are the numbers 

/ + vp a ~ l (0 v < p), 

where / is a number included in t(p a ~ l ). Hence 

p - 1 

fp°( x ) = Uf^ ~ v p a ~ 1 )- 

v=0 
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But 

fy - 1 (* - vp a ~ x ) Bsfpa -1 (x) - vp a ~ l f^_ x (. X ) (mod p a ); 
and 

= {fp-<(*)\ p -E<Vr'wr^.w 

= {^o-i (*)} p (mod /? a ), 

since v = — 1) = 0 (mod p). 

This proves (8.5.4) by induction. 

Suppose now that m = p°M and that p\M. Let t run through the <t)(p a ) 
numbers of t{p a ) and T through the </>(M) numbers of t(M). By Theorem 
61, the resulting set of <f>(m) numbers 

tM + Tp a , 

reduced mod m, is just the set t(m). Hence 

fn(x) = ]~[ (x — t) = J~| ]"^ (x - tM - Tp a ) (mod m). 

t(m) Tet(M) tet(p°) 

For any fixed T, since (jf, M) = 1, 


fl c x-tM-Tp a )= ["[ (x - tM) 

= f~[ (x - t) =fpa(x) (mod p a ). 

Hence, since there are <p(M) members of t(M), 

f m (x) = (x?~ l - l yp'-'tm (mod p0) 
by (8.5.4). But (8.5.3) follows at once, since 


p‘~ V(*o = 


P~ 1 


0(M) = 


0(”O 

P 1 " 
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8.6. Bauer’s congruence: the case p = 2. We have now to consider 
the case p = 2. We begin by proving (8.5.6). 

If a = 2, 

/ 4 (x) = (x - l)(x - 3) EEX 2 - 1 (mod 4), 
which is (8.5.6). When a > 2, we proceed by induction. If 

f 2 o-i(x) = (x 2 - l) 2a ' 3 (mod 2 a_1 ), 

then 


/ 2 a-» (*) = o (mod 2). 


Hence 


fl°(x) =f 2 a-l (x)f 2 a - . (X - 2 a_1 ) 

= {/ 2 »-> (*)) 2 - 2 a - 1 / 2fl - 1 (x)/ 2 ' fl _, (x) 

= {f 2 a- \(x)} 2 = (x 2 - 1) 2 ° 2 (mod 2 a ). 

Passing to the proof of (8.5.5), we have now to distinguish two cases. 

(1) If m = 2 M and M > 1, where M is odd, then 

fm(x ) s (X - l)* (m) = (x 2 - 1)3^ (mod 2), 

because (x — l) 2 = x 2 — 1 (mod 2). 

(2) If m = 2 a M, where M is odd and a > 1, we argue as in § 8.5, but 
use (8.5.6) instead of (8.5.4). The set of (j>{m) = 2 a ~ * 4>(M) numbers 

tM + T2 a , 

reduced mod m, is just the set t(m). Hence 

fm(x) = FT (jc “ - FI W ^ x ~ tM ~ 2 ° T ^ ( mod «) 

t(m) T&(M) /e/( 2°) 

= (mod 2), 

just as in § 8.5. (8.5.5) follows at once from this and (8.5.6). 
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8.7. A theorem of Leudesdorf. We can use Bauer’s theorem to obtain 
a comprehensive generalization of Wolstenholme’s Theorem 115. 

Theorem 128. If 

*» = £ 7 - 

t(m) 


then 


(8.7.1) 

S m = 0 (mod m 2 ) 

if 2 |/m, 3 \m; 


(8.7.2) 

= 0 (mod \m 2 ) 

if 2 f m,3|/n; 


(8.7.3) 

S m = 0 (mod 5m 2 ) 

i/2|m, 3 f m, am/ m is not a power of 2; 

(8.7.4) 

Sm = 0 (mod gro 2 ) 

i/2|m, 3| m; and 


(8.7.5) 

S m = 0 (mod J/w 2 ) 

if m = 2 a . 



We use E, FI for sums or products over the range t(m), and E', IT for 
sums or products over the part of the range in which t is less than jm; and 

we suppose that m = p a q b r c 

If/? > 2 then, by Theorem 126, 

(8.7.6) (xP~ l - 1 )*0»>/</»-i) = Yl (x - t) 

= ]~[ '{(* — 0(x — m -(- 0} = J~[ '{x 2 + t(m — 0} (mod p a ). 

We compare the coefficients of* 2 on the two sides of (8.7.6). If/? > 3, the 
coefficient on the left is 0, and 

(8.7.7) 

0 s n 'w™ -'»£ j n ' £ (mod *“>■ 
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= 5 m n < e T^TT) s ° (mod p 2 *)- 

or 

(8.7.8) S m = 0 (mod p 2a ). 

If 2 f m, 3 \ m, and we apply (8.7.8) to every prime factor of m, we obtain 
(8.7.1). 

lfp = 3, then (8.7.7) must be replaced by 

(-!)}♦<">-> $*(„) ^ J n ' E 7(~7)< mod 3*); 

so that 

* (— 1) 2^ (m ) -1 jm0(m) (mod S 2 "). 

Since 0(m) is even, and divisible by 3 a_1 , this gives 

S m = 0 (mod 3 2a_1 ). 

Hence we obtain (8.7.2). 

lfp = 2, then, by Theorem 127, 

(x 2 - l) 2 * (m) = n '{x 2 + t(m — r)}(mod 2 a ) 

and so 

S„U‘= WE J. ,) = (mod2 2 “). 

If m = 2 a Af , where Af is odd and greater than 1 , then 

5 4>(m) = 2 a ~ 2 <t>{M) 
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is divisible by 2° 1 , and 

S m = 0 (mod 2 2 * -1 ). 

This, with the preceding results, gives (8.7.3) and (8.7.4). 
Finally, if m = 2°, = 2 a ~ 2 , and 

S m = 0 (mod 2 2a ~ 1 ). 


This is (8.7.5). 

8.8. Further consequences of Bauer’s theorem. (1) Suppose that 

m > 2, m = Y\p a , u 2 = u p = ( p > 2). 

Then 4>(m) is even and, when we equate the constant terms in (8.5.3) and 
(8.5.5), we obtain 

Yl t m (-l)^(mod p a ). 

t(m) 

It is easily verified that the numbers uj and u p are all even, except when 
m is of one of the special forms 4, p a , or 2 p a \ so that TT/ = 1 (mod m ) 
except in these cases. If m = 4, then Fit = 1.3 = —1 (mod 4). If m is p a 
or 2p a , then u p is odd, so that nr = — 1 (mod p°) and therefore (since nr 
is odd) nr = — 1 (mod m). 

Theorem 129. 


J”[ r = ±1 (mod m), 

t(m) 

where the negative sign is to be chosen when m is 4, p a , or 2 p a , where p is 
an odd prime, and the positive sign in all other cases. 

The case m = p is Wilson’s theorem. 

(2) lip > 2 and 

fix) = Y[ (x ~ 0 = -Aix+W- 1 + • • -, 

t(P°) 

then fix) = fip a — x). Hence 

2Aix^P P) ~ l + 2A 3 x^f / ‘ ) - 3 + ■ • • =fi-x) -fix) =fip a +x) -fix) 

= P a f'ix) (mod/; 2 *). 
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But 

p a f\x) \)xP~ 2 {xP-' - \f“ 1-1 (mod p 2a ) 

by Theorem 126. It follows that ^2v+i is a multiple of p 2a except when 
4>(p a ) — 2v—\=p — 2 (mod p — 1), 

i.e. when 

2v = 0 (mod p — 1). 

Theorem 130. IfA2 v +i is the sum of the homogeneous products, 2v + 1 
at a time, of the numbers of t{p a ), and 2v is not a multiple of p— \, then 

^ 2 v+i = 0 (mod p 2a ). 

Wolstenholme’s theorem is the case 

a = 1, 2v + 1 = p — 2, p > 3. 

(3) There are also interesting theorems concerning the sums 

S2v+1 = ^ *2v+l * 

We confine ourselves for simplicity to the case a = 1 , m = pj and suppose 
p > 2. Then / (jc) = f (p — x) and 

/(-*) -f(p + x) =f(x) +pf(x), 
f\-x) = -fip+x) = -f(x) - pf'ix ), 
fix)f(-x) +f'(x)f(-x) = p{f 2 ix ) -f(x)f'(x)} 

to modulus pi 2 . Since / (x) = xP~ l — 1 (mod p), 

f 2 (x) -fix)f"{x) = 2^" 3 - x 2p ~ 4 (mod p) 

and so 

(8.8.1) fix)f\-x) +f'{x)f{-x) = p(2x p ~ 3 - x 2p ~ 4 ) (mod p 2 ). 

t In this case Theorem 1 12 is sufficient for our purpose, and we do not require the general form of 
Bauer’s theorem. 
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Now 


( 8 . 8 . 2 ) 

Also 


= V-!- = —Si - xS 2 - x 2 S 3 t 

' X — t 

= —2S\ - 2x 2 S 3 


fix) 
fix) 

fix)fi-x) +fi-x)f(x) 


fix)f(-x) 

/w-n(*.-«-n<»-')— («+ 7 F+^+- 
7^) = i : ( 1 + ^ + ^ + ')' 

(8 ' 83) 7w^ = ^( 1 + ^ r + ^ r + ")• 

where nr = (p— 1)! and the a, b, and care integers. It follows from (8.8.1), 
(8.8.2), and (8.8.3) that 


-2Si - 2 x 2 S 3 


pi 2xP 3 - x 2p 4 ) + p 2 g(x) 




, , ClX 2 C2X 4 

x|l + - L r + - L r + 


(' 




zu 


)■ 


where g(x) is an integral polynomial. Hence, if 2v < p— 3, the numerator 
of52v+i is divisible by p 2 . 

Theorem 131. If p is prime, 2v < p — 3, and 


*$2v+l = 1 + 


1 


+ 


1 


22v+1 ' (p-l) 2v+l’ 

then the numerator of S2 v +i is divisible by p 2 . 

The case v = 0 is Wolstenholme’s theorem. When v = 1, p must be 
greater than 5. The numerator of 

i , 1 1 1 

+ 2 3 + 3 3 + 4 3 
is divisible by 5 but not by 5 2 . 

There are many more elaborate theorems of the same character. 


t The series which follow are ordinary power series in the variable x. 
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8.9. The residues of V~ l and (p — I)! to modulus p 1 . Fermat’s and 
Wilson’s theorems show that 2 P ~ X and (p — 1)! have the residues 1 and 
— 1 (mod p). Little is known about their residues (mod p 2 ), but they can be 
transformed in interesting ways. 

Theorem 132. If p is an odd prime, then 

2P~ x -\ 1 1 1 

(8.9.1) = 1 + - + r H 1 (mod p). 

p 3 5 p — 2 

In other words, the residue of 2 P ~ X (mod p 2 ) is 

1+/, G + 5 + ~ + 

where the fractions indicate associates (mod p). 

We have 

2'> = (i + iy’ = i + (^) + ... + g) = 2 + g(/’). 

Every term on the right, except the first, is divisible by p J and 

G ) = px >’ 

where 

lUi = (p- \)(p — 2) ...(/? — /+ 1) = (— l) 7 - 1 ^ 1)! (mod p), 

or lx i = (— 1) /_1 (mod p). Hence 

XI s (-l) /_1 y (mod/7), 

(/*) —pxi = (-1 y~ l Pj (mod/7 2 ), 

op— 2 ^ ii i 

(8.9.2) = Y]xi = 1 - - + (mod p). 

P i 23 P ~ 1 

t By Theorem 75. 
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But 


, 1 1 1 

1 — — ■(■ — — ... — 

2 3 p- 1 


= 2 ( 1 + ‘ + i + .. . + _!_) 

-( 1 + 5 + 5 + - + ^ l ) 

= 2 ^1 + | + •• m + ~Ti) ( modp ^’ 


by Theorem 116,* so that (8.9.2) is equivalent to (8.9.1). 
Alternatively, after Theorem 116, the residue in (8.9.1) is 


1 _ I 

2 “ 4 


P- 1 


(mod p ). 


Theorem 133. If p is an odd prime, then 


2 

(p - 1)! = (_1)3^- 1 ) 2 2p- 2 (mod p 2 


Let p = 2n + 1. Then 


(-D 


(2ft)! 
2"ft! 
n (2ft) 1 

2 n n\ 


1.3 .. . (2ft — 1) = (p - 2 )(p — 4 )...(/?- 2ft), 
2”ft! - 2"ft!/> Q + i + . . . + (mod /? 2 ) 

3 2"ft! + 2"«!(2 2 " - 1) (mod /? 2 ), 


by Theorems 1 16 and 132; and 


(2ft)! = (— 1)"2 4 "(«!) 2 (mod p 2 ). 


t We need only (7.8.2). 
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NOTES 

§ 8.1. Theorem 121 (Gauss, DA., § 36) was known to the Chinese mathematician 
Sun-Tsu in the first century a.d. See Bachmann, Niedere Zahlentheorie, i. 83. 

§ 8.5. Bauer, Nouvelles annales (4), 2 (1902), 256-64. Rear-Admiral C. R. Darling- 
ton suggested the method by which I deduce (8.5.3) from (8.5.4). This is much simpler 
than that used in earlier editions, which was given by Hardy and Wright, Journal London 
Math. Soc. 9 ( 1 934), 38-4 1 and 240. 

Dr. Wylie points out to us that (8.5.5) is equivalent to (8.5.3), with 2 for p, except when 
m is a power of 2, since it may easily be verified that 

(x 2 - l)l* (m > = (x- l)*< m) (mod 2 a ) 
when m = 2 a M, M is odd, and M > 1 . 

§ 8.7. Leudesdorf, Proc. London Math. Soc. ( 1 ) 20 ( 1 889), 1 99-2 12. See also S. Chowla, 
Journal London Math. Soc. 9 (1934), 246; N. Rama Rao, ibid. 12 (1937), 247-50; and 
E. Jacobstal, Forhand. K. Norske Vidensk. Selskab, 22 (1949), nos. 12, 13, 41. 

§ 8.8. Theorem 129 (Gauss, DA., § 78) is sometimes called the ‘generalized Wilson’s 
theorem’. 

Many theorems of the type of Theorems 130 and 131 will be found in LeudesdorTs 
paper quoted above, and in papers by Glaisher in vols. 3 1 and 32 of the Quarterly Journal 
of Mathematics. 

§ 8.9. Theorem 132 is due to Eisenstein (1850). Full references to later proofs and 
generalizations will be found in Dickson, History, i, ch. iv. See also the note to § 6.6. 



IX 

THE REPRESENTATION OF NUMBERS BY DECIMALS 

9.1. The decimal associated with a given number. There is a process 
for expressing any positive number £ as a ‘decimal’ which is familiar in 
elementary arithmetic. 

We write 

(9.1.1) ^ = [$]+x=X + x, 

where X is an integer and 0 ^ x < l,t and consider A" and x separately. 
IfX > 0 and 


10 s ^X < 10 ,+l , 

and A i and X\ are the quotient and remainder when X is divided by 10 s , 
then 


*=^ 1 . 10 *+*!, 

where 


0 < A\ = [10 _S A] <10, 0 ^ Xi < 10 s . 

Similarly 

X\ = A 2 .10 s_1 +X 2 (0 ^A 2 < 10 , 0 < X 2 < 10 s " 1 ), 
X 2 = A 3 .10 s " 2 + X 3 (0 ^ A 3 < 10 , 0 < X 3 < 10 s " 2 ), 


X s -i = A s A0+X s (0^A S < 10, 0 < X s < 10), 

X s — A s + 1 (0 ^ A s + 1 < 10). 

Thus X may be expressed uniquely in the form 

(9.1.2) X = A l .\0 s +A 2 .\0 s ~ l +---+A.10+A J+ i, 

where every A is one of 0, 1 , 2, . . . , 9, and A\ is not 0. We abbreviate this 
expression to 

(9.1.3) X = A\A 2 .. . A s A s +\, 

the ordinary representation of A in decimal notation. 

t Thus [f] has the same meaning as in §6.11. 
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Passing to x, we write 

x =f\ (0 </i < 1). 

We suppose that a\ = [10/i], so that 

a\ r a\ + 1 
10 ^ < 10 ’ 

a\ is one of 0, 1, . . . , 9, and 

a\ = [10/i], 10/i=ai+/ 2 (0</ 2 <l). 

Similarly, we define a 2 , a 3 , . . . by 


a 2 = [lQ^L 10/ 2 = a 2 +fi (0 < 1), 

a 3 = [10/ 3 ], IO/3 = a 3 +/4 (0 </ 4 < 1), 


Every a„ is one of 0, 1, 2, . . . , 9. Thus 
(9.1.4) x=x n +g„+ 1, 


where 

(9.1.5) 

(9.1.6) 


_ , « 2 , , <*n 

Xn ~ 10 + 10 2 + '" + 10 "’ 


0 ^ g n + 1 


fn+\ j 

10 " 10 "' 


We thus define a decimal 


•aia 2 a 3 . ..a n ... 


associated with x. We call a 1, a 2 , . . . the first, second, . . . digits of the 
decimal. 

Since a„ < 10, the series 


(9.1.7) 


E <*n 

10 " 


is convergent; and since g„+\ — ► 0, its sum is x. We may therefore write 


(9.1.8) 


x= 010203 ..., 


the right-hand side being an abbreviation for the series (9.1.7). 
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If f n+ \ = 0 for some n, i.e. if 10”x is an integer, then 

On+\ ~ &n+2 = . . • = 0 . 

In this case we say that the decimal terminates. Thus 

= 0425000 


and we write simply 


17 

400 


•0425. 


It is plain that the decimal for x will terminate if and only if x is a rational 
fraction whose denominator is of the form 2“ 5^. 

Since 

&rt+l . &n+2 . 1 

+ 75^ + •••-*"+'< To^ 

and 


9 9 _ 9 1 

10 w+1 + 10 n+2 + ' ‘ “ 10 «+i (i _ _i.) “ 10«’ 


it is impossible that every a n from a certain point on should be 9. With 
this reservation, every possible sequence (a„) will arise from some x. We 
define x as the sum of the series (9. 1 .7), and x n and g n + \ as in (9. 1 .4) and 
(9.1.5). Theng„+i < 10~” for every n, and x yields the sequence required. 

Finally, if 


(9.1.9) 



and the b n satisfy the conditions already imposed on the a n , then a„ = b„ 
for every n. For if not, let and b^ be the first pair which differ, so that 
| on — bN\ ^ 1. Then 


E 


On 

10” 



1 

To 57 


00 


-E 

N+l 


|a-6| 

10” 


1 

To^ 



= o. 
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This contradicts (9.1.9) unless there is equality. If there is equality, then 
all of ajv+i — bu+ 1 , a^+ 2 — ■ . . must have the same sign and the 

absolute value 9. But then either a„ = 9 and b n = 0 for n > N, or else 
a n = 0 and b n = 9, and we have seen that each of these alternatives is 
impossible. Hence a„ = b n for all n. In other words, different decimals 
correspond to different numbers. 

We now combine (9.1.1 ), (9. 1 .3), and (9. 1 .8) in the form 

(9.1.10) £ = X + x = A\A 2 ■ . .'fj+i • aiU2«3 • • • ; 

and we can sum up our conclusions as follows. 

Theorem 1 34. Any positive number £ may be expressed as a decimal 

A\A2 • . -A s + 1 -a\a202 . . . , 


where 


0 ^ Ai < 10, 0 ^ A 2 < 10, . . . , 0 ^ a„ < 10, 


not all A and a are 0, and an infinity of the a n are less than 9. If% ^ 1, 
then A 1 > 0. There is a (l, 1) correspondence between the numbers and 
the decimals, and 


£=/f|.10 s + ...+A + ,+2i + 


a 2 
10 2 


+ 


In what follows we shall usually suppose that 0 ^ $ < 1 so that X = 0, 
£ = x. In this case all the A are 0. We shall sometimes save words by ignor- 
ing the distinction between the number* and the decimal which represents 
it, saying, for example, that the second digit of ^ is 4. 

9.2. Terminating and recurring decimals. A decimal which does not 
terminate may recur. Thus 

\ = -3333..., )f = 14285714285714...; 


equations which we express more shortly as 

I = .3, i = 442857 . 

These are pure recurring decimals in which the period reaches back to the 
beginning. On the other hand, 


\ = 1666... = 16, 
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a mixed recurring decimal in which the period is preceded by one non- 
recurrent digit. 

We now determine the conditions for termination or recurrence. 

(l)If 


x 



= £ 
2 « 5 ^’ 


where (p, q) = 1 , and 

(9.2.1) ix = max(or,/J), 

then 10" jc is an integer for n = p, and for no smaller value of n, so that x 
terminates at Conversely, 

ai ■ fl 2 _ p _P 

10 + 10 2 ’ 10 ^ 10 ^ q 

where q has the prime factors 2 and 5 only. 

(2) Suppose next that x = p/q, (p,q) = 1, and (q, 10) = 1, so that q 
is not divisible by 2 or 5. Our discussion of this case depends upon the 
theorems of Ch. VI. 

By Theorem 88, 


10 v = 1 (mod q) 


for some v, the least such v being a divisor of <j>(q). We suppose that v has 
this smallest possible value, i.e. that, in the language of § 6.8, 10 belongs 
to v (mod q) or v is the order of 1 0 (mod q). Then 


(9.2.2) 


10 v x = 


10 v p ( mq+l)p p 

= = mp = mp + x, 

q q q 


where m is an integer. But 


10 v x = 10 v * v + 10 v g v +i = 10 v x v +/ v+ i, 

by (9.1.4). Since 0 < x < 1, f v+ \ = x, and the process by which the 
decimal was constructed repeats itself from f v+ \ onwards. Thus x is a pure 
recurring decimal with a period of at most v figures. 



THE REPRESENTATION OF NUMBERS BY DECIMALS 


143 


9 . 2 ( 135 )] 


On the other hand, a pure recurring decimal -a\ 02 ... ax is equal to 





1 

10 1 + 



10 x x a\ + 10 x 2 ci2 H h ax _ P 

10 * - 1 ~ ~q 


when reduced to its lowest terms. Here q\ 10 x — 1, and so X. ^ v. It follows 
that if (tjr, 10) = 1, and the order of 10 (mod q) is v, then x is a pure recurring 
decimal with a period of just v digits; and conversely. 

(3) Finally, suppose that 


(9.2.3) 


P P 
X ~ q~ 2«5PQ’ 


where ( p,q ) = 1 and ( Q , 10) = 1; that p is defined as in (9.2.1); and that 
v is the order of 10 (mod Q). Then 



where p',X, P are integers and 

0<X<10*\ 0 <P<Q, (P, 0 = 1. 


liX > 0 then 10* < X < 1(H +1 , for somes < p,, and^ — A 1 A 2 . . . A s+ \ ; 
and the decimal for P/Q is pure recurring and has a period of v digits. 
Hence 

\0^x ■= A\A2 . . .A s +\ • a\d2 • • >a v 
and 

(9.2.4) x = b\b 2 . ..bfj,a\a 2 • • • Qvi 

the last s + 1 of the b being A\,A 2 , ■ ■ ■ , A s + 1 and the rest, if any, 0. 

Conversely, it is plain that any decimal (9.2.4) represents a fraction 
(9.2.3). We have thus proved 

Theorem 135. The decimal for a rational number p/q between 0 and 1 
is terminating or recurring, and any terminating or recurring decimal is 
equal to a rational number. If(p,q) — \,q = 2“5^, and max (a, ft) = p, 
then the decimal terminates after p digits. If(p, q) = 1, q = 2“ 5^0, where 
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Q > 1, (Q, 10) = 1, and v is the order of 10 (mod Q), then the decimal 
contains fx non-recurring and v recurring digits. 

93. Representation of numbers in other scales. There is no reason 
except familiarity for our special choice of the number 1 0; we may replace 
10 by 2 or by any greater number r. Thus 


1 

8 

2 

3 

2 

3 


= - + + -i = 001, 

2 2 2 2 3 

10 10 
= 2 + ? + F + 2* + -" 

4 4 4 . 

= — + ry + rr + — — *4, 

7 7 2 V 



the first two decimals being ‘binary’ decimals or ‘decimals in the scale of 
2’, the third a ‘decimal in the scale of 7’.* Generally, we speak of ‘decimals 
in the scale of r*. 

The arguments of the preceding sections may be repeated with certain 
changes, which are obvious if r is a prime or a product of different primes 
(like 2 or 10), but require a little more consideration if r has square divisors 
(like 12 or 8). We confine ourselves for simplicity to the first case, when 
our arguments require only trivial alterations. In § 9. 1, 10 must be replaced 
by r and 9 by r — 1 . In § 9.2, the part of 2 and 5 is played by the prime 
divisors of r. 


Theorem 1 36. Suppose that r is a prime or a product of different primes. 
Then any positive number £ may be represented uniquely as a decimal in 
the scale of r. An infinity of the digits of the decimal are less than r — 1 ; 
with this reservation, the correspondence between the numbers and the 
decimals is (1, 1). 

Suppose further that 

p 

0 < x < 1, x = -, (p,q) = \. 

q 


V 


q — s a t ^ . . . u y . 


t We ignore the verbal contradiction involved in the use of ‘decimal’; there is no other convenient 
word. 
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where s, t,.. ,,u are the prime factors of r, and 

p = ma x(a,P,...,y), 

then the decimal for x terminates at the pth digit. If q is prime to r, and 
v is the order of r (mod q), then the decimal is pure recurring and has a 
period of v digits. If 


q = ...u*Q (Q> 1), 

Q is prime to r, and v is the order of r (mod Q), then the decimal is mixed 
recurring, and has p, non-recurring and v recurring digits f 

9.4. Irrationals defined by decimals. It follows from Theorem 136 
that a decimal (in any scale*) which neither terminates nor recurs must 
represent an irrational number. Thus 

x= 0100100010... 

(the number of 0’s increasing by 1 at each stage) is irrational. We consider 
some less obvious examples. 

Theorem 137: 


• 011010100010 ..., 

where the digit a n is 1 if n is prime and 0 otherwise, is irrational. 

Theorem 4 shows that the decimal does not terminate. If it recurs, there 
is a function An + B which is prime for all n from some point onwards; 
and Theorem 21 shows that this also is impossible. 

This theorem is true in any scale. We state our next theorem for the scale 
of 10, leaving the modifications required for other scales to the reader. 

Theorem 138. 


•2357111317192329..., 


+ Generally, when r = s*t B ... , up , we must define n as 


max 


(* t 


y 

c 


) 


if this number is an integer, and otherwise as the first greater integer. 

i Strictly, any ( quadratfrei f scale (scale whose base is a prime or a product of different primes). This 
is the only case actually covered by the theorems, but there is no difficulty in the extension. 
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where the sequence of digits is formed by the primes in ascending order, is 
irrational. 

The proof of Theorem 138 is a little more difficult. We give two 
alternative proofs. 

(1) Let us assume that any arithmetical progression of the form 

kA(f +l + 1 (* = 1,2,3,...) 

contains primes. Then there are primes whose expressions in the decimal 
system contain an arbitrary number s of 0’s, followed by a 1. Since the 
decimal contains such sequences, it does not terminate or recur. 

(2) Let us assume that there is a prime between N and 1(W for every 
N ^ 1. Then, given s, there are primes with just s digits. If the decimal 
recurs, it is of the form 

(9.4.1) . . .\a\a 2 . . .a k \a\a 2 . . .a k \ . . . , 

the bars indicating the period, and the first being placed where the first 
period begins. We can choose / > 1 so that all primes with s = kl digits 
stand later in the decimal than the first bar. If p is the first such prime, then 
it must be of one of the forms 

p = a\a 2 ...a k \a\a 2 ...a k \... \ a\a 2 ...a k 

or 

p — (*m+ 1 • • -ak\<*ia 2 ...a k \... \a\a 2 . . .a*|aia 2 . . .a m 

and is divisible by aia 2 . . . a k or by a m +\ . . . a k a\a 2 . . . a m ; a contradiction. 

In our first proof we assumed a special case of Dirichlet’s Theorem 15. 
This special case is easier to prove than the general theorem, but we shall 
not prove it in this book, so that (1) will remain incomplete. In (2) we 
assumed a result which follows at once from Theorem 418 (which we shall 
prove in Chapter XXII). The latter theorem asserts that, for every N > 1, 
there is at least one prime satisfying N < p ^ 2 N. It follows, a fortiori, 
that N < p < 10AL 

9.5. Tests for divisibility. In this and the next few sections we shall be 
concerned for the most part with trivial but amusing puzzles. 

There are not very many useful tests for the divisibility of an integer by 
particular integers such as 2, 3, 5, . . . .A number is divisible by 2 if its last 
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digit is even. More generally, it is divisible by 2 V if and only if the number 
represented by its last v digits is divisible by 2 V . The reason, of course, is 
that 2 V | 10 v ; and there are similar rules for 5 and 5 V . 

Next 


10 v = l(mod 9) 


for every v, and therefore 

Ai.l(f -\-A2.10 5 * + • • • +/4 5 .10+./4 i y+i 
= A\ +^ 2 + • * • +-4 s +i (mod 9). 


A fortiori this is true mod 3. Hence we obtain the well-known rule ‘a number 
is divisible by 9 (or by 3) if and only if the sum of its digits is divisible by 
9 (or by 3)’. 

There is a rather similar rule for 11. Since 10 = — 1 (mod 1 1), we have 
10 2r = 1, 10 2r+1 = —1 (mod 11), 

so that 

-^j.IO 5 + Aj- 10 s * H — • + y4 5 .10 + A s + 1 
= A S + 1 -A s +A s -i (mod 11). 


A number is divisible by 1 1 if and only if the difference between the sums 
of its digits of odd and even ranks is divisible by 1 1 . 

We know of only one other rule of any practical use. This is a test for 
divisibility by any one of 7, 1 1 , or 1 3, and depends on the fact that 7.11.13 = 
1001. Its working is best illustrated by an example: if 29310478561 is 
divisible by 7, 11 or 13, so is 

561 -478 + 310-29 = 364 = 4.7.13. 

Hence the original number is divisible by 7 and by 13 but not by 1 1. 

9.6. Decimals with the maximum period. We observe when learning 
elementary arithmetic that 

\ = -i42857, § = -285714, . . . , f = -857142, 

the digits in each of the periods differing only by a cyclic permutation. 

Consider, more generally, the decimal for the reciprocal of a prime q. 
The number of digits in the period is the order of 10 (mod q), and is a 
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divisor of <j>(q) = q - 1. If this order is q — 1, i.e. if 10 is a primitive root 
of q, then the period has q — 1 digits, the maximum number possible. 

We convert 1 /q into a decimal by dividing successive powers of 10 by 
q ; thus 


10 ” 

— = I0 n x n +f n + 1 , 

q 

in the notation of § 9. 1 . The later stages of the process depend only upon 
the value of f n +\, and the process recurs so soon as f„+ \ repeats a value. If, 
as here, the period contains q — 1 digits, then the remainders 




must all be different, and must be a permutation of the fractions 

12 q- 1 

j 9 • • • 9 ♦ 

q q q 


The last remainder f q is \/q. 

The corresponding remainders when we convert p/q into a decimal are 


• • • ,pfq, 

reduced (mod 1). These are, by Theorem 58, the same numbers in a differ- 
ent order, and the sequence of digits, after the occurrence of a particular 
remainder s/q, is the same as it was after the occurrence of s/q before. 
Hence the two decimals differ only by a cyclic permutation of the period. 

What happens with 7 will happen with any q of which 10 is a primitive 
root. Very little is known about these q, but the q below 50 which satisfy 
the condition are 


7, 17, 19, 23,29,47. 

Theorem 139. If q is a prime, and 10 w a primitive root of q, then the 
decimals for 


~(p = \,2,...,q- 1) 

q 


have periods of length q — 1 and differing only by cyclic permutation. 
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9.7. Sachet’s problem of the weights. What is the least number 
of weights which will weigh any integral number of pounds up to 40 
(a) when weights may be put into one pan only and ( b ) when weights 
may be put into either pan? 

The second problem is the more interesting. We can dispose of the first 
by proving 

Theorem 140. Weights 1 , 2 , 4, , 2" _1 will weigh any integral weight 
up to 2 n — 1 ; and no other set of so few as n weights is equally effective 
(i.e. will weigh so long an unbroken sequence of weights from 1). 

Any positive integer up to 2" — 1 inclusive can be expressed uniquely 
as a binary decimal of n figures, i.e. as a sum 


n - 1 

0 

where every a s is 0 or 1 . Hence our weights will do what is wanted, and 
‘without waste’ (no two arrangements of them producing the same result). 
Since there is no waste, no other selection of weights can weigh a longer 
sequence. 

Finally, one weight must be 1 (to weigh 1 ); one must be 2 (to weigh 2); 
one must be 4 (to weigh 4); and so on. Hence 1 , 2, 4, ... , 2" _1 is the only 
system of weights which will do what is wanted. 

It is to be observed that Bachet’s number 40, not being of the form 2 n — 1 , 
is not chosen appropriately for this problem. The weights 1, 2, 4, 8, 16, 32 
will weigh up to 63, and no combination of 5 weights will weigh beyond 3 1 . 
But the solution for 40 is not unique; the weights 1, 2, 4, 8, 9, 16 will also 
weigh any weight up to 40. 

Passing to the second problem, we prove 

Theorem 141. Weights 1, 3, 3 1 2 , . . . , 3" _1 will weigh any weight up to 
5 (3” — 1), when weights may be placed in either pan; and no other set of 
so few as n weights is equally effective. 

( 1 ) Any positive integer up to 3” — 1 inclusive can be expressed uniquely 

by n digits in the ternary scale, i.e. as a sum 
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where every a s is 0, 1, or 2. Subtracting 

l+S + S^.-' + S "" 1 = i(3"- 1), 

we see that every positive or negative integer between — 5 (3” — 1) and 
i(3" — 1) inclusive can be expressed uniquely in the form 


n - 1 

X> 3 ‘. 

0 

where every b s is — 1, 0, or 1 . Hence our weights, placed in either pan, will 
weigh any weight between these limits. 1- Since there is no waste, no other 
combination of n weights can weigh a longer sequence. 

(2) The proof that no other combination will weigh so long a sequence 
is a little more troublesome. It is plain, since there must be no waste, that 
the weights must all differ. We suppose that they are 


W 1 < W2 < ■ • • < W n . 

The two largest weighable weights are plainly 

W = W\ + W2 H h W„, W\ — W2 H V w„. 


Since W\ — W — l,wi must be 1. 
The next weighable weight is 


— W| + W2 + W3 H 1- w„ = W — 2, 


and the next must be 


Wl + W3 + W4 H h W n . 

Hence w\ + W3 + • • • + w n = W — 3 and W2 = 3. 

t Counting the weight to be weighed positive if it is placed in one pan and negative if it is placed 
in the other. 
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Suppose now that we have proved that 


wi = 1,W2 = 3, 


,w s = 


3 


5—1 


If we can prove that w s+ i = 3 s , the conclusion will follow by induction. 
The largest weighable weight W is 

s n 

W = ^ w t + 53 w *- 
1 s+l 

Leaving the weights w a +i , . . . , w n undisturbed, and removing some of the 
other weights, or transferring them to the other pan, we can weigh every 
weight down to 


s n 

+ £>, = W - (3 s - 1), 

1 5+ 1 

but none below. The next weight less than this is W — 3 s , and this must be 


Wi + W2 H h w s + W J+ 2 + w J+ 3 H h w„. 


Hence 


w v+ i = 2(wi + W2 H h w s ) + 1 = 3*, 

the conclusion required. 

Bachet’s problem corresponds to the case n = 4. 

9.8. The game of Nim. The game of Nim is played as follows. Any 
number of matches are arranged in heaps, the number of heaps, and 
the number of matches in each heap, being arbitrary. There are two players, 
A and B. The first player A takes any number of matches from a heap; he 
may take one only, or any number up to the whole of the heap, but he must 
touch one heap only. B then makes a move conditioned similarly, and the 
players continue to take alternately. The player who takes the last match 
wins the game. 

The game has a precise mathematical theory, and one or other player can 
always force a win. 

We define a winning position as a position such that if one player P (A 
or B) can secure it by his move, leaving his opponent Q ( B or A) to move 
next, then, whatever Q may do, P can play so as to win the game. Any 
other position we call a losing position. 
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For example, the position 


or (2, 2), is a winning position. If A leaves this position to B, B must take 
one match from a heap, or two. If B takes two, A takes the remaining two. 
If B takes one, A takes one from the other heap; and in either case A wins. 
Similarly, as the reader will easily verify, 


5 


or (1, 2, 3), is a winning position. 

We next define a correct position. We express the number of matches in 
each heap in the binaiy scale, and form a figure F by writing them down 
one under the other. Thus (2, 2), (1, 2, 3), and (2, 3, 6, 7) give the figures 


10 

01 

010 ; 

10 

10 

Oil 

— 

11 

110 

20 

— 

111 


22 

— 


242 

it is convenient to write 01, 010, .. . for 1, 10, . . . so as to equalize the 
number of figures in each row. We then add up the columns, as indicated in 
the figures. If the sum of each column is even (as in the cases shown) then 
the position is ‘correct’. An incorrect position is one which is not correct: 
thus (1, 3, 4) is incorrect. 

Theorem 142. A position in Nim is a winning position if and only if it is 
correct. 

(1) Consider first the special case in which no heap contains more than 
one match. It is plain that the position is winning if the number of matches 
left is even, and losing if it is odd; and that the same conditions define 
correct and incorrect positions. 

(2) Suppose that P has to take from a correct position. He must replace 
one number defining a row of F by a smaller number. If we replace any 
number, expressed in the binary scale, by a smaller number, we change 
the parity of at least one of its digits. Hence when P takes from a correct 
position, he necessarily transforms it into an incorrect position. 
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(3) If a position is incorrect, then the sum of at least one column of F is 
odd. Suppose, to fix our ideas, that the sums of the columns are 

even, even, odd, even, odd, even. 

Then there is at least one 1 in the third column (the first with an odd sum). 
Suppose (again to fix our ideas) that one row in which this happens is 

* * 

011101 , 

the asterisks indicating that the numbers below them are in columns whose 
sum is odd. We can replace this number by the smaller number 


* * 

010110 , 

in which the digits with an asterisk, and those only, are altered. Plainly 
this change corresponds to a possible move, and makes the sum of every 
column even; and the argument is general. Hence P, if presented with an 
incorrect position, can always convert it into a correct position. 

(4) If A leaves a correct position, B is compelled to convert it into an 
incorrect position, and A can then move so as to restore a correct position. 
This process will continue until every heap is exhausted or contains one 
match only. The theorem is thus reduced to the special case already proved. 

The issue of the game is now clear. In general, the original position will 
be incorrect, and the first player wins if he plays properly. But he loses 
if the original position happens to be correct and the second player plays 
properly.* 


t When playing against an opponent who does not know the theory of the game, there is no need 
to play strictly according to rule. The experienced player can play at random until he recognizes a 
winning position of a comparatively simple type. It is quite enough to know that 

1 , 2n, 2n + 1 , n, 1 — n, 7, 2, 3,4, 5 


are winning positions; that 


1 , 2 n -f- 1 , 2 n 4- 2 

is a losing position; and that a combination of two winning positions is a winning position. The winning 
move is not always unique. The position 


1,3,9,27 

is incorrect, and the only move which makes it correct is to take 16 from the 27. The position 

3,5,7,8,11 


is also incorrect, but may be made correct by taking 2 from the 3, the 7, or the 11. 
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There is a variation in which the player who takes the last match loses. 
The theory is the same so long as a heap remains containing more than one 
match; thus (2, 2) and (1, 2, 3) are still winning positions. We leave it to 
the reader to think out for himself the small variations in tactics at the end 
of the game. 

9.9. Integers with missing digits. There is a familiar paradox* con- 
cerning integers from whose expression in the decimal scale some particular 
digit such as 9 is missing. It might seem at first as if this restriction should 
only exclude ‘about one-tenth’ of the integers, but this is far from the truth. 

Theorem 143. Almost all numbers t contain a 9, or any given sequence 
of digits such as 937. More generally, almost all numbers, when expressed 
in any scale, contain every possible digit, or possible sequence of digits. 

Suppose that the scale is r, and that v is a number whose decimal misses 
the digit b. The number of v for which r l ~ l ^ v < r l is (r — l) ! if b = 0 
and (r — 2)(r — 1) /_1 if b ^ 0, and in any case does not exceed (r — 1) / . 
Hence, if 

r k ~ [ ^ n < r k , 

the number N(n) of v up to n does not exceed 

- 1 )*; 


which tends to 0 when n -*■ oo. 

The statements about sequences of digits need no additional proof, since, 
for example, the sequence 937 in the scale of 1 0 may be regarded as a single 
digit in the scale of 1000. 

The ‘paradox’ is usually stated in a slightly stronger form, viz. 

Theorem 144. The sum of the reciprocals of the numbers which miss a given digit is 
convergent. 


r — 1 + (r - l) 2 + • • • + (r — 1)* < k(r 


and 


N(n) ^(r- 1)* 


^ k 


n 


,*-i 


< kr 


(^y 


t Relevant in controversies about telephone directories, 
t In the sense of § 1 .6. 
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The number of v between and r* is at most (r — 1)*. Hence 


155 


oo 


= £ £ 

*=1 r*-! <v<r k 


1 

v 

k-\ 


°° (r 00 /r \\ k ~ l 

*=1 *=1 v ' 


We shall discuss next some analogous, but more interesting, properties 
of infinite decimals. We require a few elementary notions concerning the 
measure of point-sets or sets of real numbers. 

9.10. Sets of measure zero. A real number x defines a ‘point’ of 
the continuum. In what follows we use the words ‘number’ and ‘point’ 
indifferently, saying, for example, that l P is the point x’. 

An aggregate of real numbers is called a set of points. Thus the set T 
defined by 


x = - (n = 1,2,3,...), 

n 

the set R of all rationals between 0 and 1 inclusive, and the set C of all real 
numbers between 0 and 1 inclusive, are sets of points. 

An interval (x — S,x 4- 8), where 8 is positive, is called a neighbourhood 
of x. If S is a set of points, and every neighbourhood of x includes an 
infinity of points of S, then x is called a limit point of S. The limit point 
may or may not belong to S, but there are points of S as near to it as we 
please. Thus T has one limit point, x = 0, which does not belong to T. 
Every x between 0 and 1 is a limit point of R. 

The set S' of limit points of S is called the derived set or derivative of 
S. Thus C is the derivative of R. If S includes S', i.e. if every limit point 
of S belongs to S, then S is said to be closed. Thus C is closed. If S' includes 
S, i.e., if every point of S is a limit point of S, then S is said to be dense in 
itself. If S and S' are identical (so that S is both closed and dense in itself), 
then S is said to be perfect. Thus C is perfect. A less trivial example will 
be found in § 9. 1 1 . 

A set S is said to be dense in an interval (a, b) if every point of (a, b) 
belongs to S'. Thus R is dense in (0, 1). 

If S can be included in a set J of intervals, finite or infinite in number, 
whose total length is as small as we please, then S is said to be of measure 
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zero. Thus T is of measure zero. We include the point \/n in the interval 

- — 2 ~ n ~ x 8 , - -+■ 2~ n ~ x 8 
n n 

of length 2 ~ n 8, and the sum of all these intervals (without allowance for 
possible overlapping) is 

OO 

8 XI 2 ~ n = 5 ’ 

l 

which we may suppose as small as we please. 

Generally, any enumerable set is of measure zero. A set is enumerable 
if its members can be correlated, as 

(9.10.1) x \ , *2, • • • 9 , 

with the integers 1,2, ... ,n, We include x„ in an interval of length 

2 ~ n 8, and the conclusion follows as in the special case of T. 

A subset of an enumerable set is finite or enumerable. The sum of an 
enumerable set of enumerable sets is enumerable. 

The rationals may be arranged as 

0111213123 
P P 2’ 3’ 3» 5’ 5’ S’ ■ * ■ 

and so in the form (9.10.1). Hence R is enumerable, and therefore of mea- 
sure zero. A set of measure zero is sometimes called a null set; thus R is 
null. Null sets are negligible for many mathematical purposes, particularly 
in the theory of integration. 

The sum S of an enumerable infinity of null sets S n (i.e. the set formed 
by all the points which belong to some S„) is null. For we may include S„ 
in a set of intervals of total length 2~ n 8, and so S in a set of intervals of 
total length not greater than 8 £ 2“" = 8 . 

Finally, we say that almost all points of an interval / possess a property 
if the set of points which do not possess the property is null. This sense of 
the phrase should be compared with the sense defined in § 1.6 and used in 
§ 9.9. It implies in either case that ‘most’ of the numbers under consideration 
(the positive integers in §§ 1.6 and 9.9, the real numbers here) possess the 
property, and that other numbers are ‘ exceptional’. t 

t Our explanations here contain the minimum necessary for the understanding of §§ 9.1 1-13 and a 
few later passages in the book. In particular, we have not given any general definition of the measure 
of a set. There are fuller accounts of all these ideas in the standard treatises on analysis. 
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9.11. Decimals with missing digits. The decimal 

if = -M2857 


has four missing digits, viz. 0, 3, 6, 9. But it is easy to prove that decimals 
which miss digits are exceptional. 

We define S as the set of points between 0 (inclusive) and 1 (exclusive) 
whose decimals, in the scale of r, miss the digit b. This set may be generated 
as follows. 

We divide (0, 1) into r equal parts 


s 5+1 

-<x< (5 = 0, l,...,r- 1); 

r r 


the left-hand end point, but not the right-hand one, is included. The 5th 
part contains just the numbers whose decimals begin with 5—1, and if we 
remove the (b + l)th part, we reject the numbers whose first digit is b. 

We next divide each of the r — 1 remaining intervals into r equal parts 
and remove the (b + l)th part of each of them. We have then rejected all 
numbers whose first or second digit is b. Repeating the process indefinitely, 
we reject all numbers in which any digit is b\ and S is the set which 
remains. 

In the first stage of the construction we remove one interval of length 1 /r; 
in the second, r — 1 intervals of length 1/r 2 , i.e. of total length (r — l)/r 2 ; 
in the third, (r — l) 2 intervals of total length (r — l) 2 /r 3 ; and so on. What 
remains after k stages is a set J* of intervals whose total length is 


k 

>-£ 


/=i 


(r-1) 7 - 1 


and this set includes S for every k. Since 


k 


/=i 


(r - 1) /_1 
r l 




= 0 


when k -*■ oo, the total length of Jk is small when k is large; and S is 
therefore null. 


Theorem 145. The set of points whose decimals, in any scale, miss any 
digit is null: almost all decimals contain all possible digits. 
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The result may be extended to cover combinations of digits. If the 
sequence 937 never occurs in the ordinary decimal for x, then the digit 
‘937’ never occurs in the decimal in the scale of 1000. Hence 

Theorem 146. Almost all decimals, in any scale, contain all possible 
sequences of any number of digits. 

Returning to Theorem 145, suppose that r = 3 and b — 1. The set S is 
formed by rejecting the middle third ( j, |) of ( 0 , 1 ), then the middle thirds 
(|, 5 ), ( 5 , |) of (0, 5 ), and (|, l) and so on. The set which remains 
is null. 

It is immaterial for this conclusion whether we reject or retain the end 
points of rejected intervals, since their aggregate is enumerable and there- 
fore null. In fact our definition rejects some, such as 1 /3 = • 1 , and includes 
others, such as 2/3 = -2. 

The set becomes more interesting if we retain all end points. In this 
case (if we wish to preserve the arithmetical definition) we must allow 
ternary decimals ending in 2 (and excluded in our account of decimals at the 
beginning of the chapter). All fractions p/3 n have then two representations, 
such as 


i = .1 = .02 

(and it was for this reason that we made the restriction); and an end point 
of a rejected interval has always one without a 1 . 

The set S thus defined is called Cantor ’s ternary set. 

Suppose that x is any point of (0, 1), except 0 or 1 . If x does not belong 
to S, it lies inside a rejected interval, and has neighbourhoods free from 
points of S, so that it does not belong to S'. If x does belong to S, then 
all its neighbourhoods contain other points of S; for otherwise there would 
be one containing x only, and two rejected intervals would abut. Hence x 
belongs to S'. Thus S and S' are identical, and x is perfect. 

Theorem 147. Cantor’s ternary set is a perfect set of measure zero. 

9.12. Normal numbers. The theorems proved in the last section 
express much less than the full truth. Actually it is true, for example, not 
only that almost all decimals contain a 9, but that, in almost all decimals, 
9 occurs with the proper frequency, that is to say in about one-tenth of the 
possible places. 



159 


9.12] THE REPRESENTATION OF NUMBERS BY DECIMALS 

Suppose that x is expressed in the scale of r, and that the digit b occurs 
rtf, times in the first n places. If 



n 


when n — > oo, then we say that b has frequency ft. It is naturally not neces- 
sary that such a limit should exist; nb/n may oscillate, and one might expect 
that usually it would. The theorems which follow prove that, contrary to 
our expectation, there is usually a definite frequency. The existence of the 
limit is in a sense the ordinary event. 

We say that x is simply normal in the scale of r if 


(9.12.1) 


nb 1 

n r 


for each of the r possible values of b. Thus 

x = 0123456789 


is simply normal in the scale of 10. The same x may be expressed in the 
scale of 10 10 , when its expression is 

x = b, 

where b — 123456789. It is plain that in this scale x is not simply normal, 
10 10 — 1 digits being missing. 

This remark leads us to a more exacting definition. We say that x is 
normal in the scale of r if all of the numbers 

x, rx, r 2 *, . . . t 

are simply normal in all of the scales 


It follows at once that, when x is expressed in the scale of r, every 
combination 


b\b2 ...bk 

t Strictly, the fractional parts of these numbers (since we have been considering numbers between 
0 and 1). A number greater than 1 is simply normal, or normal, if its fractional part is simply normal, 
or normal. 
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of digits occurs with the proper frequency; i.e. that, if nb is the number of 
occurrences of this sequence in the first n digits of x, then 


(9.12.2) 


rib J_ 

n r k 


when n — > oo. 

Our main theorem, which includes and goes beyond those of § 9. 1 1, is 


Theorem 148. Almost all numbers are normal in any scale. 


9.13. Proof that almost all numbers are normal. It is sufficient to 
prove that almost all numbers are simply normal in a given scale. For 
suppose that this has been proved, and that S(x, r) is the set of numbers 
x which are not simply normal in the scale of r. Then S(x, r), S(x, r 2 ), 
S(x, r 3 ), ... are null, and therefore their sum is null. Hence the set T (x, r) 
of numbers which are not simply normal in all the scales r, r 2 , . . . is null. 
The set T(rx,r) of numbers such that rx is not simply normal in all these 

scales is also null; and so are T (r 2 x, r), T (r 3 x, r), Hence again the sum 

of these sets, i.e. the set U (x, r) of numbers which are not normal in the 
scale of r, is null. Finally, the sum of U (x, 2), U (x, 3), . . . is null; and this 
proves the theorem. 

We have therefore only to prove that (9.12.1) is true for almost all num- 
bers x. We may suppose that n tends to infinity through multiples of r, since 
(9. 12. 1) is true generally if it is true for n so restricted. 

The numbers of r-ary decimals of n figures, with just m b ' s in assigned 
places, is (r — Hence the number of such decimals which contain 

just m b’s, in one place or another, is* 


pin, m) = 


n\ 

ml(n — m)\ 


(r - l) n ~ m 


We consider any decimal, and the incidence of b’s among its first n digits, 
and call 


p = m = m — n* 

r 

t p(n, m) is the term in (r — 1 ) n ~ m in the binomial expansion of 

{l + (r-l)) n . 
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the n-excess of b (the excess of the actual number of b ’ s over the number 
to be expected). Since n is a multiple of r, n* and p are integers. Also 


(9.13.1) 
We have 

(9.13.2) 


Hence 


1 ii 1 

— ^ ^ 1 - 
r n r 


pin, m+ 1) n — m 

pin, m) (r - 1 )(/w + 1) 

__ (r — l)n — rp 

(r — 1 )n + r(r - 1 )ip + 1)' 


pin, m + 1) 

> 1 

pin, m) 


(M= — 1,-2,...), 


pin,m+ 1) 

<1 

pin, m ) 


ip = 0,1,2,...); 


so that pin, m ) is greatest when 


p = 0, m = n*. 


Ifp ^ 0, then, by (9.13.2) 
pin, m + 1) 


(9.13.3) 


(r — l)n — rp 


pin, m) 


(r - l)/i + r(r — 1 )ip + 1) 
r p 


< 1 - 


r — 1 n 


< exp 


(-*;)■ 


If p < 0 and v = \p\, then 


^ /?(n, m + 1) _ jr - \)m _ jr — l)/i — r(r - l)v 

pin, m) n-m + l ~ (r - \)n + r(v + 1) 

< 1 _C; < exp(-^)=exp(-^!). 
We now fix a positive 8, and consider the decimals for which 


(9.13.5) 


I p\ > Sn 



162 THE REPRESENTATION OF NUMBERS BY DECIMALS [Chap. IX 

for a given n. Since n is to be large, we may suppose that \p\ ^ 2. If p is 
positive then, by (9.13.3), 


pin, m ) 


pin, m) pin, m — 1) pin, m — p + 1) 


p(/i, m — p) pin, m — 1) p(«, m — 2) p(n, m — p) 

ip - 1) + (/x - 2) + 


< exp 


l-^T 


= exp {- 2 ^r) 


< e 


n 

—Kn 2 /n 


• • + i j 


where K is a positive number which depends only on r. Since 

pin, m- p) = p(n, n*) < r n ,t 

it follows that 


(9.13.6) 


pin, m ) < K ^! n . 


Similarly it follows from (9.13.4) that (9.13.6) is true also for negative p. 

Let S n ip) be the set of numbers whose ^-excess is p. There are p = 
pin,m) numbers £i, £ 2 , . . . , £p represented by terminating decimals of n 
figures and excess p, and the numbers of S„ ip) are included in the intervals 

£ 5 , k + r - ” is = 1,2,..., p). 


Hence S„ip) is included in a set of intervals whose total length does not 
exceed 

r~ n pin, m) < e~ K>i2,n . 


And if T„i&) is the set of numbers whose ^-excess satisfies (9.13.5), then 
T„i8) can be included in a set of intervals whose length does not exceed 


^2 e -*M 2 /* =2^2 e~ K,l2/n ^ 2 ^2 e~i Kfl2/n e-i K>l/n 

\fi\^Sn pL^Sn 

, , 00 , '>*-iK& 2 n , „ 

< 2e-i Ks2 ” VriW = , < 

to 1-e-i^" 


where L, like K, depends only on r. 


f Indeed p(n, m) < r" for all m. 
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We now fix N (a multiple N*r of r), and consider the set Un(8) of 
numbers such that (9. 13.5) is true for some 


n = n*r ^ N = N*r . 


Then Un(8) is the sum of the sets 

7jy(S), Tn+k( 8), Tfif+2r(8 ), ...» 

i.e. the sets T„ (5) for which n = kr and k ^ N* . It can therefore be included 
in a set of intervals whose length does not exceed 

OO 

L £ kre-* K&2kr = r)(N*); 

k—N* 

and t](N*) — > 0 when n* and N* tend to infinity. 

If U(S) is the set of numbers whose /i-excess satisfies (9.13.5) for an 
infinity of n (all multiples of r), then U(8) is included in Uh(8) for every 
N, and can therefore be included in a set of intervals whose total length is 
as small as we please. That is to say, U (5) is null. 

Finally, if x is not simply normal, (9.12.1) is false (even when n is 
restricted to be a multiple of r), and 

iMl > S" 

for some positive £ and an infinity of multiples n of r. This f is greater 
than some one of the sequence S, \8, \8, . . . , and so x belongs to some 
one of the sets 


U(S), U(\S), £/(]«),..., 


all of which are null. Hence the set of all such x is null. 

It might be supposed that, since almost all numbers are normal, it would 
be easy to construct examples of normal numbers. There are in fact simple 
constructions; thus the number 


• 9 


■123456789101112.. 



164 THE REPRESENTATION OF NUMBERS BY DECIMALS [Chap. IX 

formed by writing down all the positive integers in order, in decimal nota- 
tion, is normal. But the proof that this is so is more troublesome than might 
be expected. 


NOTES 

§ 9.4. For Theorem 1 38 see P61ya and Szegd, No. 257. The result is stated without proof 
in W. H. and G. C. Youngs’ The theory of sets of points , 3. 

§ 9.5. See Dickson, History , i, ch. xii. The test for 7, 11, and 13 is not mentioned 
explicitly. It is explained by Grunert, Archiv der Math . und Phys. 42 (1864), 478-82. 
Grunert gives slightly earlier references to Brilka and V. A. Lebesgue. 

§§ 9.7-8. See Ahrens, ch. iii. 

There is an interesting logical point involved in the definition of a ‘losing* position in 
Nim. We define a losing position as one which is not a winning position, i.e. as a position 
such that P cannot force a win by leaving it to Q. It follows from our analysis of the game 
that a losing position in this sense is also a losing position in the sense that Q can force a 
win if P leaves such a position to Q. This is a case of a general theorem (due to Zermelo 
and von Neumann) true of any game in which there are only two possible results and only 
a finite choice of ‘moves’ at any stage. See D. Konig, Acta Univ. Hungaricae (Szeged), 3 
(1927), 121-30. 

§ 9. 10. Our ‘limit point’ is the ‘limiting point’ of Hobson’s Theory of functions of a real 
variable or the ‘Haufimgspunkt’ of HausdorfTs Mengenlehre. 

§§ 9.12-13. Niven and Zuckerman {Pacific Journal of Math . 1 (1951), 103-9) and 
Cassels (ibid. 2 (1952), 555-7) give proofs that, if (9.12.2) holds for every sequence of 
digits, then jc is normal. This is the converse of our statement that (9. 12.2) follows from the 
definition; the proof of this converse is not trivial. 

For the substance of these sections see Borel, Leqons sur la theorie des fonctions (2nd ed., 
1914), 182-216. Theorem 148 has been developed in various ways since it was originally 
proved by Borel in 1909. For an account and bibliography, see Kuipers and Niederreiter, 
69-78. 

Champemowne {Journal London Math . Soc. 8 (1933), 254-60) proved that 123 ... is 
normal. Copeland and Erdds {Bulletin Amer. Math. Soc. 52 (1946), 857-60) proved that, if 
a\ , 02, . . . is any increasing sequence of integers such that a n < n x+€ for every € > 0 and 
n > rto(0> then the decimal 


• 01 ^ 2^3 • • • 

(formed by writing out the digits of the a n in any scale in order) is normal in that scale. 
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CONTINUED FRACTIONS 

10.1. Finite continued fractions. We shall describe the function 

1 


( 10 . 1 . 1 ) 


ao + 


a\ + 


1 


02 + 


1 


*3 + 


1 

+ 

a n 


of the TV + 1 variables 


ao , a i , . . . , On , , a/\[ , 


as a finite continued fraction, or, when there is no risk of ambiguity, simply 
as a continued fraction. Continued fractions are important in many branches 
of mathematics, and particularly in the theory of approximation to real 
numbers by rationals. There are more general types of continued fractions 
in which the ‘numerators’ are not all 1 ’s, but we shall not require them here. 

The formula (10.1.1) is cumbrous, and we shall usually write the 
continued fraction in one of the two forms 


1 1 1 

ao H . . . — 

ai+a2+ a^f 


or 

[a o,a\,a2,. . .,on]. 

We call ao, aj , . . ., on the partial quotients, or simply the quotients, of the 
continued fraction. 

We find by calculation that* 


r . a o r , <*iao + 1 

[*0]=— , [*0,<*l] = ! 

1 a\ 


[ao,ai, af\ = 


02 ^ 1^0 + a2 + ao 
020\ + 1 


t There is a clash between our notation here and that of § 6.11, which we shall use again later in 
the chapter (for example in § 10.5). In § 6.1 1, [x] was defined as the integral part of x; while here [ao] 
means simply ao. The ambiguity should not confuse the reader, since we use [ao] here merely as a 
special case of [ao, a \ , . . . , a„]. The square bracket in this sense will seldom occur with a single letter 
inside it, and will not then be important 
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and it is plain that 

(10.1.2) [a 0 , a\] = a 0 + — , 

a\ 

(10.1.3) [ao, ai , . . . ,a„_i,a„] = j^ao, ai, • • - ,a w _ 2 ,a n _i + — j , 

(10.1.4) 

[ao, = ao + = [ao, [ao, a\, • • 

L fl 0> a \> • • • > a n\ 

for 1 ^ n < N. We could define our continued fraction by (10.1.2) and 
either ( 1 0. 1 .3) or ( 1 0. 1 .4). More generally 

(10.1.5) [ao, ai, • • • »a w ] = [ao, a \t • • ♦ > a m— 1, t a m> a m+l> • • • > a /»]] 
for 1 ^ w < n ^ iV. 

10.2. Convergents to a continued fraction. We call 
[a 0 , ai,...,a„] (0 < n < N) 

the nth convergent to [ao, ai,..., a#]. It is easy to calculate the convergents 
by means of the following theorem. 

Theorem 149. If p n and q n are defined by 

( 10 . 2 . 1 ) 

Po = ao, pi=aia 0 + l, p„ = a„p n -\ + p n ~ 2 (2 ^ n < N), 

( 10 . 2 . 2 ) 

qo =1, qi=a\, q„ = a„q n -\ + q„- 2 (2 < n < N ), 


then 

(10.2.3) (ao> nit • ••><*»] = — ■ 

In 

We have already verified the theorem for n = 0 and n = 1. Let us 
suppose it to be true for n < m, where m < N. Then 


[ao, a j , . . . , a m — i , am] 




a mPm—\ + Pm-2 
a mqm—\ + qm-2 * 
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andp m -\,p m - 2 , q m - 1 , q m - 2 depend only on 
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Hence, using (10.1.3), we obtain 

l>0, ® 1 » • • • » — 1 1 ^Bl) tf/n+l ] — U, ^1 j • • • » — 1 > "f" | 

L «m+ 1 J 

( a >" + +A.-2 

( a ” + sir) »«-i + ?»-2 

_ a m+l ( a mPm— 1 +Pw-2) + Pw-1 
qm— l) + 1 

a m+\Pm "I" Pm— l Pm + 1 _ 

a m +\q m + q m -\ q m +i ’ 

and the theorem is proved by induction. 

It follows from (10.2.1) and (10.2.2) that 

(10.2.4) ^ = -" fo- 1 

qn o n qn—i + 2 

Also 

Pnq n -\ ~Pn-\q n = (<*„/>«- 1 +/>«- 2 )?«-i -p n _i(fl w ^ w _i +^„_ 2 ) 

= (Pn— 1 qn—2 Pn—2qn—\)' 

Repeating the argument with n — \,n — 2, . . . , 2 in place of n, we obtain 
Pnqn—\ -p n -\q n = (-lr^mo -poqi) = (-i)" _1 . 

Also 

Pnqn-2 ~ Pn-2qn — (flnPn - 1 +/ , n-2)^ r n-2 ~ Pn-2( a nqn - 1 + qn-2) 

= a n (p n -\q n ^2 —Pn-iqn-\) = (-1 )”<*„. 

Theorem 150. 77ie functions p n and q„ satisfy 

(10.2.5) -p n -\q n = (-1)” -1 
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or 


( 10 . 2 . 6 ) 


Pn _ Pn - 1 
qn qn - 1 


Theorem 151. They also satisfy 




Qn—\Qn 


(10.2.7) p n q n - 2 -Pn-iqn = (-1 ) n a n 


or 

(10.2.8) PJL_PJ!Z1 = <Z 

qn q n - 2 qn-iqn 

10.3. Continued fractions with positive quotients. We now assign 
numerical values to the quotients a„, and so to the fraction (10.1.1) and to 
its convergents. We shall always suppose that 

(10.3.1) a\ > 0, . . . ,on > 0,* 


and usually also that a n is integral, in which case the continued fraction 
is said to be simple. But it is convenient first to prove three theorems 
(Theorems 152-4 below) which hold for all continued fractions in which 
the quotients satisfy (10.3.1). We write 

Pn 

Xn = ~, X=X N , 

q n 

so that the value of the continued fraction is xn or x. 

It follows from (10.1.5) that 

(10.3.2) [ O.Q , ai, , On] = [uo> £t 1> • • • > &n— 1> (*n+ 1» • • • > QnW 

__ [g«> t*n+ 1, • • • , ONjPn-l +Pn- 2 
[am . . . j ]?« — l 4" qn — 2 


for 2 ^ n ^ N. 

Theorem 1 52. The even convergents X2 n increase strictly with n, while 
the odd convergents X2n+i decrease strictly. 

Theorem 153. Every odd convergent is greater than any even conver- 
gent. 


t ao may be negative. 
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Theorem 1 54. The value of the continued fraction is greater than that of 
any of its even convergents and less than that of any of its odd convergents 
( except that it is equal to the last convergent, whether this be even or odd). 

In the first place every q n is positive, so that, after (10.2.8) and (10.3.1), 
x n — x „-2 has the sign of (—1)”. This proves Theorem 152. 

Next, after (10.2.6), x n — x„-\ has the sign of (— l)" -1 , so that 

(10.3.3) X2m+\ > x 2 m- 

If Theorem 153 were false, we should have *2m+i ^ *2/z for some pair 
m, (X. If n < m, then, after Theorem 152, Jt2m+i < *2m, and if /a > m, then 
*2/x+i < x 2u> an d either inequality contradicts (10.3.3). 

Finally, x = xn is the greatest of the even, or the least of the odd 
convergents, and Theorem 154 is true in either case. 

10.4. Simple continued fractions. We now suppose that the a n are 
integral and the fraction simple. The rest of the chapter will be concerned 
with the special properties of simple continued fractions, and other fractions 
will occur only incidentally. It is plain that p n and q n are integers, and q„ 
positive. If 

[ao, a\ , a 2 , . . . , an] = — = x, 

qN 

we say that the number x (which is necessarily rational) is represented by 
the continued fraction. We shall see in a moment that, with one reservation, 
the representation is unique. 

Theorem 155. q n ^ q n -\forn > 1, with inequality when n > 1. 
Theorem 1 56. q„ ^ n, with inequality when n > 3. 

In the first place, q 0 = 1, q\ = a\ > 1. If/i ^ 2, then 

q n = a n q n -\ + q„- 2 > q „- 1 + 1, 
so that q„ > q n -\ and q n > n. If n > 3, then 


q n > q n - 1 + q n - 2 > q n - 1 + 1 ^ n. 


and so q n > n. 

A more important property of the convergents is 

Theorem 157. The convergents to a simple continued fraction are in 
their lowest terms. 
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For, by Theorem 150, 

d\p* . d\q„ -* -»■ d\\. 

10.5. The representation of an irreducible rational fraction by a sim- 
ple continued fraction. Any simple continued fraction [ao, a \, . . . , av] 
represents a rational number 


x=x N . 

In this and the next section we prove that, conversely, every positive 
rational x is representable by a simple continued fraction, and that, apart 
from one ambiguity, the representation is unique. 

Theorem 158. If x is representable by a simple continued fraction with 
an odd (even) number of convergents, it is also representable by one with 
an even (odd) number. 

For, if a n ^ 2, 

bo, ^ l , • • • » ^nh bo, » • • • » 1,1], 

while, if a„ 1 , bo, ® l , • * • , — l , 1 1 — bo, ® l , • • • , &n — 2 , ®b — l -H 1 ]• 

For example 

[2,2,3] = [2,2,2, 1]. 

This choice of alternative representations is often useful. 

We call 

a'n = b«,a«+i, • • • ,a N ] (0 ^ n ^ N) 
the n-th complete quotient of the continued fraction 


bo, oi, . . . ,a„, . . . , af/]- 


Thus 


and 


, a ] a 0 + 1 
x = a 0 , x= - J — 

a \ 


x 


dnPn-\ +£n- 2 

a' n q n -i + q „- 2 


(2 < n ^ N). 


(10.5.1) 
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Theorem 159. a„ = [a' n \, the integral part of d n ) except that 

ClN - 1 = [aw-l] — 1 

when ax — 1. 

IfN = 0, then ao — a' 0 = [ag]. If N > 0, then 

a'„ = a„ + —7- — (0 ^ n ^ N — 1). 

a n+ 1 

Now 

a f n+l >1 (0 ^ n ^ N - 1) 

except that a! n+ ^ = 1 when n = N — 1 and = 1. Hence 

(10.5.2) a n < a' n < a„ + 1 (0 ^ n < N — 1) 

and 

a n = Kl (0 ^ ^ N - 1) 

except in the case specified. And in any case 

as = a' N — [a#]. 

Theorem 160. If two simple continued fractions 

[ao,ai,...,awL [bo,b\, . . . ,b\f] 

have the same value x, and an > 1, 6 a/ > 1, then M = N and the fractions 
are identical. 

When we say that two continued fractions are identical we mean that 
they are formed by the same sequence of partial quotients. 

By Theorem 159, ao = [x] = bo. Let us suppose that the first n partial 
quotients in the continued fractions are identical, and that a' n , b' n are the wth 
complete quotients. Then 

x [ao, a 1 , . . . , an — 1 , af\ — [<zo> ai , . . . , an — 1 , bf\. 


Ifn = 1, then 


1 1 

ao + — = ao + 77, 

a \ b \ 


t We revert here to our habitual use of the square bracket in accordance with the definition of § 6. 1 1 . 
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a\ = b\, and therefore, by Theorem 159, a\ = b\. If n > 1, then, by 
(10.5.1), 

a'nPn-X +Pn-2 VjJPn - 1 + Pn-2 
a'rfln - 1 + 9/1-2 b' n q n ~\ + q „- 2 ’ 

(a'„ - b' n )(p„-iq n -2 -Pn-iqn-x) = 0. 

But pn-xqn-2 - p n -2q n -\ = (-1)", by Theorem 150, and so = b' n . It 
follows from Theorem 159 that a„ = b„. 

Suppose now, for example, that N ^ M. Then our argument shows that 

a n = b n 


for n ^ N. If M > N, then 

Pn r , r . . , V n +\Pn+Pn-x 

— = [ao,a \ = [ao,a \, . . . . . . ,0*/J = 


qN 

by (10.5.1); or 


& N+l qN + 9N-X ’ 


PNqN-x — PN-xqN — 0, 

which is false. Hence M = N and the fractions are identical. 

10.6. The continued fraction algorithm and Euclid’s algorithm. Let 

x be any real number, and let ao = [x]. Then 


x = ao + £o, 0 ^ < 1. 


If £o 0, we can write 


1 


— =a'i, [a\] — a\, a\ = a\ + £i, 0 ^ £i < 1. 

50 


If £i 0, we can write 


1 


— = a' 2 = ci 2 + % 2 , 0 < & < 1, 

51 


and so on. Also at n = l/£„_i > 1, and so a„ ^ 1, for si. Thus 
x = [ao,a\] = |^*o, ai + = [ao.al.aj] = [ao,ai, 02 , 03 ] = .... 
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a\ > 0, d 2 > 0,.. 
The system of equations 


_ 1 _ 

So 

_ 1 _ 

Ti 


= ao + So 

(0 ^ So < 1), 

= a\ = a\ + Si 

(0 ^ £1 < 1), 

= a '2 = a 4- £2 

(0 ^ $2 < 1), 


is known as the continued fraction algorithm. The algorithm continues so 
long as £„ ^=0. If we eventually reach a value of n, say N, for which 
= 0, the algorithm terminates and 

x = [ao,a\,a 2 ,. . . ,on). 

In this case x is represented by a simple continued fraction, and is rational. 
The numbers a' n are the complete quotients of the continued fraction. 

Theorem 161. Any rational number can be represented by a finite simple 
continued fraction. 

If x is an integer, then £o = 0 and x = ao. If x is not integral, then 

h 

where h and k are integers and k > 1 . Since 

7 = ao + So, h = ao k + l-ok, 

k 

ao is the quotient, and k\ = £o k the remainder, when h is divided by k.^ 

f The ‘remainder’, here and in what follows, is to be non-negative (here positive). If ao ^ 0, then 
x and h are positive and k\ is the remainder in the ordinary sense of arithmetic. If ao < 0, then x and 
h are negative and the 'remainder’ is 

(x - [x])k. 

Thus if h = —l,k = 5, the 'remainder’ is 
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If £o / 0, then 



k_ 

h 


and 

Jc 

— =a\+ £ 1 , k = aiki+Z\ki; 

k\ 

thus a i is the quotient, and = £i^i the remainder, when k is divided by 
k \ . We thus obtain a series of equations 


h = aok + k\, k = aik\ + k 2 , k\ = 02^2 + ^3» • • • 


continuing so long as / 0, or, what is the same thing, so long as 

^n+1 5^ 0. 

The non-negative integers k,k\,k 2 , . . . form a strictly decreasing 
sequence, and so &v+i = 0 for some N. It follows that = 0 for 
some N, and that the continued fraction algorithm terminates. This proves 
Theorem 161. 

The system of equations 

h = aok + k\ (0 < k\ < k), 

k = a\k\ + k 2 (0 < k 2 < k\). 


kN-2 = + ku (0 < fa < kff- 1 ), 

kN - 1 = aNkN 

is known as Euclid’s algorithm. The reader will recognize the process as 
that adopted in elementary arithmetic to determine the greatest common 
divisor k n of h and k. 

Since = 0, a' N = a^ ', also 

11 

0 < = — = %N - 1 < 1 , 

on a N 

and so on > 2. Hence the algorithm determines a representation of the 
type which was shown to be unique in Theorem 1 60. We may always make 
the variation of Theorem 1 58. 

Summing up our results we obtain 

Theorem 162. A rational number can be expressed as a finite simple 
continued fraction in just two ways, one with an even and the other with 
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10.6(163)] 


an odd number of convergents. In one form the last partial quotient is 1, 
in the other it is greater than 1. 


10.7. The difference between the fraction and its convergents. 

Throughout this section we suppose that N > \ and n > 0. By (10.5.1) 


dn+\]fn + Pn-l 
a ' n + iQn + Rn - 1 ’ 


for 1 ^ n ^ N — \, and so 

_Pn _ Pn^n-\ ~ Pn-UIn _ (~ 1 )” 

q n q n (a' n+l q„ + q„- 1 ) qn(a' n+l q n + q n - 1 ) 


Also 

If we write 


x 


P o 
90 



(10.7.1) q\ = a\, q' n = a' n q„- \ + q„- 2 (1 < n < N) 

(so that, in particular, q' N = qs), we obtain 

Theorem 163. If 1 ^ n ^ N — \, then 

Pn (-D" 

X = 7 . 

qn qnq n +\ 

This formula gives another proof of Theorem 154. 

Next, 

&n+ 1 < a n+\ < ~ a n+ 1 "b 1 

for n ^ N — 2, by (10.5.2), except that 


a f N - 1 = a N-\ + 1 


when as = 1 . Hence, if we ignore this exceptional case for the moment, 
we have 

(10.7.2) q\=a\ < a\ + 1 ^ q 2 
and 

(10.7.3) 9/i+l = ^/»+i9/» -b 9/»—i ^ ^/»+i9« "b 9»— l = 9n+l » 

(10.7.4) q' n+l < a n+ \q n + q „- 1 + q„ = tfn+i + q n 

^ tf/»+29/»+l + 9« = 9«+2> 
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for 1 ^ n ^ N — 2. It follows that 

(10.7.5) — < | p n - q n x | < — (n^N- 2), 

Vn+2 Qn + 1 

while 

(10.7.6) l/W-i - = — , /w - qnx = 0. 

. qN 

In the exceptional case, (10.7.4) must be replaced by 

q's-i = ( a N-\ + 1 )<7jV— 2 + qN - 3 = ?at-i + <7a-2 = <7N 


and the first inequality in (10.7.5) by an equality. In any case (10.7.5) 
shows that | p n — q n x\ decreases steadily as n increases; a fortiori , since q„ 
increases steadily, 

*-&■ 

In 

decreases steadily. 

We may sum up the most important of our conclusions in 
Theorem 164. IfN > 1, n > 0, then the differences 

Pn 

X , q n X~Pn 

qn 

decrease steadily in absolute value as n increases. Also 

(-1 ) n Sn 

q n X - Pn = , 

q> i+i 


where 

0 < 8„ < 1 (1 ^ n ^ AT - 2), S N - 1 = 1, 

and 


(10.7.7) 


\x 


Pn 

qn 




1 

qnqn+l 


< 




forn ^ N 


1, with inequality in both places except when n = N — 1. 
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10.8. Infinite simple continued fractions. We have considered so far 
only finite continued fractions; and these, when they are simple, represent 
rational numbers. The chief interest of continued fractions, however, lies 
in their application to the representation of irrationals, and for this infinite 
continued fractions are needed. 

Suppose that ao, a \ , ai,... is a sequence of integers satisfying (10.3.1), 
so that 

Xfi — [<Z0, Cl\ , . . . , flfl] 

is, for every n, a simple continued fraction representing a rational number 
x„. If, as we shall prove in a moment, x„ tends to a limit x when n — ► oo, 
then it is natural to say that the simple continued fraction 

(10.8.1) [ao>ai,tf2, • • •] 
converges to the value x, and to write 

(10.8.2) x = [oo, ai f <i 2 ,...]. 


Theorem 165. If ao, a i, <22 > • • • is a sequence of integers satisfying 
(10.3.1), then x n = [ao,a \, . . . , a„] tends to a limit x when n — ► 00 . 

We may express this more shortly as 

Theorem 166. All infinite simple continued fractions are convergent. 
We write 

Pn r i 

Xfi — [flo, a \ , . . . , Oft\, 

Qn 

as in § 10.3, and call these fractions the convergents to (10.8.1). We have 
to show that the convergents tend to a limit. 

If N ^ n, the convergent x n is also a convergent to [ao, a \, . . :,a\]. 
Hence, by Theorem 1 52, the even convergents form an increasing and the 
odd convergents a decreasing sequence. 

Every even convergent is less than x\, by Theorem 153, so that the 
increasing sequence of even convergents is bounded above; and -every 
odd convergent is greater than xq, so that the decreasing sequence of odd 
convergents is bounded below. Hence the even convergents tend to a limit 
£ 1 , and the odd convergents to a limit £ 2 . and £1 ^ £ 2 - 
Finally, by Theorems 150 and 156, 


Pin P2n-l 


1 




1 


qinqin-x 2n(2n — 1 ) 


qin qin— 1 


0, 
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so that £1 = £2 = x, say, and the fraction (10.8.1) converges to x. 
Incidentally we see that 

Theorem 167. An infinite simple continued fraction is less than any of 
its odd convergents and greater than any of its even convergents. 

* 

Here, and often in what follows, we use ‘the continued fraction’ as an 
abbreviation for ‘the value of the continued fraction’. 

10.9. The representation of an irrational number by an infinite 
continued fraction. We call 

a n +\ » • • •} 

the n-th complete quotient of the continued fraction 

* x [flo> tt\ , . . 


Clearly 


a 'n = J im a n + 1 > ■ • • > <W] 


N->oo 


= a„+ lim 


1 


N-*oo [a„+i,. . . ,as] 


— o n H — — 


*n + 1 


and in particular 


* = a ' 0 = a 0 + — . 


Also 


a n > a n + 1 > <*n + 1 >0, 0 < < 1; 


a 


w+i 


and so a„ = [a' n \. 

Theorem 168. If [ao, a\, a 2 , . . .] = x, then 


a 0 = [x], a n = [a' n ] ( n > 0). 

From this we deduce, as in §10.5, 

Theorem 169. Two infinite simple continued fractions which have the 
same value are identical. 
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We now return to the continued fraction algorithm of § 10.6. If x is irra- 
tional the process cannot terminate. Hence it defines an infinite sequence 
of integers 

and as before 

X [0Q, 01 1 [00, 01 , 02 1 • * • [0Q, 01 , 02, • • • , 0/i, 0fi-|- j], 


where 


Hence 


by (10.5.1), and so 


0 / i+ 1 0//+1 “ I - / - :> 0 / 1 + 1 - 

a n + 2 


^n+lPn +Pn-l 
a 'n+l<ln + <ln-\ ’ 


Pn _ Pn-Uln ~ ^nfin-X (-1)” 

<ln QnW n +\<ln + ^/i-l) ^/i(0^+i^/j + Qn~ l) 


* 


Pn 

<ln 


1 

< 

<ln (0/i+ l(7/i + <7n— l) 


1 




1 


^/«^/*+i -|- 1) 


0 


when n — ► oo. Thus 


x lim [ 00 , 01 , • • • , 0/i, • • •], 

n-+oo q n 

and the algorithm leads to the continued fraction whose value is x, and 
which is unique by Theorem 169. 


Theorem 1 70. Every irrational number can be expressed in just one way 
as an infinite simple continued fraction. 

Incidentally we see that the value of an infinite simple continued fraction 
is necessarily irrational, since the algorithm would terminate if x were 
rational. 

We define 

?n = a ' n <ln - 1 + 9n- 2 

as in § 10.7. Repeating the argument of that section, we obtain 
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Theorem 171. The results of Theorems 163 and 164 hold also (< except 
for the references to N) for infinite continued fractions. In particular 


1 1 

< < ~ 2 - 

QnQn+l q n 

10.10. A lemma. We shall need the theorem which follows in § 10.1 1. 
Theorem 172. If 

_p;+r 

X ~ QS+S' 

where f > 1 and P, Q, R, and S are integers such that 

Q>S> 0, PS-QR = ± 1, 

then R/S and P/Q are two consecutive convergents to the simple continued 
fraction whose value is x. If R/S is the (n — l)th convergent, and P/Q the 
n-th, then f is the (n + 1 )th complete quotient. 

We can develop P/Q in a simple continued fraction 

( 10 . 10 . 1 ) £ = <■„] = -. 

Q q n 

After Theorem 158, we may suppose n odd or even as we please. We 
shall choose n so that 

(10.10.2) PS-QR = ±\= (-1)" -1 . 

Now (P, Q) = 1 and Q > 0, and p n and q n satisfy the same conditions. 
Hence (10.10.1) and (10.10.2) imply P = p n , Q — q n> and 


(10.9.1) 


x — 


Pn 


Pn s - q n R = PS — QR = (-1)" 1 =p„q„-i -p„-iq„. 


or 

(10.10.3) p n (S - q„-\) = q„(R -p n -\). 

Since ( p„ , q„ ) = 1, (10.10.3) implies 
(10-10-4) |(S -?„_,). 
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But 

qn = Q - > & - > ^ 9«-l - > 

and so 

I'S — <ln - 1 1 < (In , 

and this is inconsistent with (10. 10.4) unless S — q n - 1 = 0. Hence 

S = C[n—U R — Pn—\ 


and 


or 


PnK +Pn - 1 

X — 

q n K +q n -\ 

X — [ QQ , a 1 , • • • » >(]• 


If we develop £ as a simple continued fraction, we obtain 


( — [^/»+l » &n+2, • • •] 


where a„+i = [£] ^ 1. Hence 


X — [ao, Q\, ... , Qn, Qn+ 1 , On+2> • • •]> 

a simple continued fraction. But p n - 1 lq n - 1 and pjqn » that is /?/<S and P/ Q, 
are consecutive convergents of this continued fraction, and £ is its (/i+l)th 
complete quotient. 

10.11. Equivalent numbers. If £ and q are two numbers such that 

h aq + b 

s — i j ’ 

cq + d 

where a, b , c, d are integers such that ad — be = ± 1, then £ is said to be 
equivalent to q. In particular, £ is equivalent to itself.* 

If £ is equivalent to q, then 

77 = (— d)(—a ) — be = ad — be = ± 1 , 

c£ — a 

and so q is equivalent to £. Thus the relation of equivalence is symmetrical. 

Theorem 173. If £ am/ q are equivalent, and q and £ are equivalent, 
then £ a/ui £ are equivalent. 


t a = d = l,b = c = 0. 
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For 


and 


where 


$ 

n 


at) + b 
cr} + d' 
a'; 4- b' 
c'S+d'* 


ad — be = ± 1 , 
a'd' - b'c' = ±1, 


A£+B 

CC+D’ 


A = aa' 4- be ' , B = ab' 4- bd' , C = ca' 4- dc , D = cb' 4- dd' , 

AD -BC= (ad - be) (a'd' - b'c') = ±1. 


We may also express Theorem 173 by saying that the relation of equiva- 
lence is transitive. The theorem enables us to arrange irrationals in classes 
of equivalent irrationals. 

If h and k are coprime integers, then, by Theorem 25, there are integers 
h' and k' such that 


hk' — h'k = 1; 

and then 

h h' . 0 4- h _ a . 0 4- b 
k = k'.O + k = c.O + d’ 

with ad— be = — 1 . Hence any rational h/k is equivalent to 0, and therefore, 
by Theorem 173, to any other rational. 

Theorem 174. Any two rational numbers are equivalent. 

In what follows we confine our attention to irrational numbers, repre- 
sented by infinite continued fractions. 

Theorem 175. Two irrational numbers $ and rj are equivalent if and 
only if 

(10.11.1) 

£ — [®o> o\ , . . . , flmi co, c \ , C 2 , • • «], t] [ho, hi, . . . , h/j, co, ci , C 2 , . • .], 

the sequence of quotients in t- after the m-th being the same as the sequence 
in T) after the n-th. 
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Suppose first that £ and 77 are given by ( 10 . 11 . 1 ) and write 

(O = [CQ, C \ , C2 , . . .]. 

Then 

. r , p m CD+p m -\ 

? L^O 9 & 1 9 * • • 9 &TTI 9 J , ) 

q m co + q m - 1 

and p m q m ~ i — p m -\q m = ± 1 , so that £ and to are equivalent. Similarly, 
77 and cd are equivalent, and so £ and 77 are equivalent. The condition is 
therefore sufficient. 

On the other hand, if £ and 77 are two equivalent numbers, we have 

a H + b u u _i_ 1 

77 = ———, ab — bc = ± 1. 

c£ + d 

We may suppose c£ + d > 0 , since otherwise we may replace the coef- 
ficients by their negatives. When we develop £ by the continued fraction 
algorithm, we obtain 


Hence 


where 


£ = [ao,ai,...,ajt,ajt+i,...] 


[ao, • • • » — 1 > 


Pk-Wk + -P*-2 
qk-\a' k + qk-2 


Pa k + R 

Qa' k + S’ 


P = ap k _ , + bq k _ , , = ap k _ 2 + bq k _ 2 , 

Q = cp k _ x + dq k _ x , S = cp k _ 2 + dq k _ 2 , 

so that P, Q, R, S are integers and 

PS — QR = (ad — bc)(pk-\qk -2 — Pk-iqk-\) = ± 1 . 

By Theorem 171 , 

8 8 f 
Pk— 1 = £<7*-l H , Pk-2 = kqk-2 H , 

Qk-i qk-2 
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where |$| < 1, |5'| < 1. Hence 

c8 c &' 

Q = (clj + d)qk- 1 + ~j — , 5 = (c| + d)qk -2 + - -• 

dk-i 4k- 2 

Now c§ + d > 0,qk-\ > qk-2 > 0, and qk-i and qk -2 tend to infinity; 
so that 

Q > S > 0 

for sufficiently large k. For such k 


PS +R 

n ~ QS + S' 


where 

PS — QR = ±1, Q > S > 0, ? = a* > 1; 

and so, by Theorem 172, 

rj = [bo,b\,...,bi,S] = [bo,b\, . . . ,bi,ak,ak+i, . . 

for some bo,b\,..., bi. This proves the necessity of the condition. 

10.12. Periodic continued fractions. Aperiodic continued fraction is 
an infinite continued fraction in which 

ai = ai+k 

for a fixed positive k and all / ^ L. The set of partial quotients 

®L -\- 1 1 • • • i &L-\-k — I 

is called the period, and the continued fraction may be written 

[ao, a\ , . . . , az,_i , a£, ol+ i , . . . , 1 ]. 

We shall be concerned only with simple periodic continued fractions. 

Theorem 176. A periodic continued fraction is a quadratic surd, i.e. an 
irrational root of a quadratic equation with integral coefficients. 
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If al L is the Lth complete quotient of the periodic continued fraction x, 
we have 


&L ^L+l 9 • • • 9 &L+k— 1 9 QL,9 &L + 1 9 • • •] 

= [aL9 QL+ \9 • -9 <*L+k- \9 0Z,L 

, _ P a L +P 

L q'ct L +q"' 

(10.12.1) q’a’l + (q” - p')a' L -p" = 0, 

where the fractions p" /q" and p' /q’ are the last two convergents to [ol, 
ol+\,. . i ]. 

But 

_ PL- 1^2 +PL-2 , _ PZ.-2 ~ qL—2 x 

qL-\a' L +qL-2 L qL-\X—pL~\ 

If we substitute for in (10.12.1), and clear of fractions, we obtain an 

equation 

(10.12.2) ax 2 + bx + c = 0 


with integral coefficients. Since x is irrational, b 2 — 4 ac ^ 0. 

The converse of the theorem is also true, but its proof is a little more 
difficult. 

Theorem 1 77. The continued fraction which represents a quadratic surd 
is periodic . 


A quadratic surd satisfies a quadratic equation with integral coefficients, 
which we may write in the form (10.12.2). If 


x [a 0 , a \ , . . . , a n , . . .], 


then 


x = 


Pn-lK +Pn- 2 . 
Qn—\ a ' n qn—2 


and if we substitute this in (10. 12.2) we obtain 


(10.12.3) 


"I" &n a n + Cn — 0, 
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where 

A n = ap 2 n _ x + bp n -\q n -\ + cql_ u 

B n = lap„-\Pn -2 + b(p n -iq n -2 + Pn-Mn-x) + 2cq n -\q n -2 , 

Cn = ap 2 n _ 2 + bp n - 2 q n -2 + cq 2 n _ 2 . 

If 

An = ap 2 n _ 2 + bq„-iq„-i + cq 2 n _ x = 0, 

then (10.12.2) has the rational root p n -\/q n -\ , and this is impossible 
because x is irrational. Hence A„ ^ 0 and 

A n y 2 + B n y + C = 0 

is an equation one of whose roots is a! n . A little calculation shows that 

(10.12.4) B 2 n - 4A„C„ = ( b 2 - 4ac){p n -\q n -2 - p n - 2 q n -i) 2 

= b 2 — 4ac. 


By Theorem 171, 


s 

P„— l —xq „- 1 + (|5„_i| < 1). 

qn — 1 


Hence 


A n — a (xq n - 1 H + bq n -\ (xq n -\ -4 — — — ^ + cq %_ j 

V qn— 1/ \ qn-\ / 

s 2 

= (ax 2 + £>x + c)^ 2 _j + 2axS„-i + a-^- + bS„-i 

<T n - 1 

= 2axS„_i + + Hi-i, 

%-x 


and 

Kl < 2|ox| + \a\ + |*|. 

Next, since C„ = , 


|C„| < 2|ax| + \a\ + |Z>|. 
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B 2 n ^4\A n C n \ + \b 2 -4ac\ 

< 4(2 \ax\ + \a\ + \b \ ) 2 + \b 2 - 4ac \ . 

Hence the absolute values of A„, B n , and C n are less than numbers 
independent of n. 

It follows that there are only a finite number of different triplets 
( A„ , B„, C„); and we can find a triplet (A, B, C ) which occurs at least three 
times, say as (A ni ,B ni ,C ni ), (A n2 ,B n 2 , C n2 ), and (An^,B n3 , C# 3 ). Hence 

a 'n x * a 'n 2 * a 'n 3 . ^ a11 roots of 

Ay 2 +By + C = 0, 

and at least two of them must be equal. But if, for example, a' ni = a' n2 , then 


< 2/12 — , df | 2 +1 — flni + 1 • • • > 

and the continued fraction is periodic. 

10.13. Some special quadratic surds. It is easy to find the continued 
fraction for a special surd such as «J2 or ^/3 by carrying out the algorithm 
of § 10.6 until it recurs. Thus 


(10.13.1) V2= 1 + U/2— 1) = 1 + 


y/2 + 1 


= 1 + 


1 


2 + (V2- 1) 


= 1 + 


1 


1 


and, similarly, 

(10.13.2) 

(10.13.3) 

(10.13.4) 


2 + y/2 + 1 


= 1 + =- 


2 + 2 + . . . 


= [1,2], 


y/3 = 1 + 
y/5 = 2 + 
Jl = 2 + 


1 1 1 


1 


1 + 2 + 1+2 + .. . 
1 1 


= [1,1,2], 


4+4 + .. 
1 1 1 


= [2,4], 
1 


1+1+1+4 + ... 


= [2,1, 1,1,4]. 


But the most interesting special continued fractions are not usually ‘pure’ 
surds. 
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A particular simple type is 


(10.13.5) 


x — b 


1 1 1 


1 


= [ML 


a+ b-\- a-\~ b -l - . . . 
where a\b, so that b = ac, where c is an integer. In this case 


(10.13.6) 

(10.13.7) 

In particular 

(10.13.8) 

(10.13.9) 

(10.13.10) 


. 1 1 (ab+l)x + b 

x = b+— -- = — , 

a+ x ax + 1 

x 2 — bx — c = 0, 
x = ^{b + -y/ (b^ + 4c)}. 


^ = 2 + xVrV = [2] = + I, 


y = 2 + 


2 + 2 + 

1 1 

I+2 + . 


= [2,i} = y/3+1. 


It will be observed that fi and y are equivalent, in the sense of § 1 0. 1 1 , to 
y/2 and ^3 respectively, but that a is not equivalent to y/5. 

It is easy to find a general formula for the convergents to (10.13.5). 

Theorem 178. The ( n + 1 )th convergent to (10.13.5) is given by 

(10.13.11) p„ = c “[j ( "+ I) ] M j|+ 2 , qn — c _ [2 ( ' ,+1) ] Mn+1> t 
where 

x" - y" 

(10.13.12) u n = - — 

x-y 


and x and y are the roots of (10.13.6). 

^ The power of c is c~ m when n = 2m andc~ m_1 when n = 2m + 1. 
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In the first place 


b x+y 

q 0 =l=uu q\ = a = - — 

c c 


U2 

C 


Pq = b = x + y = u 2 . 


p\ = ab + 1 = 


b 2 + c 


(*+y) 2 -xy 

c 


u 3 
c 


so that the formulae (10. 13. 11) are true for n = Oand/i = 1. We prove the 
general formulae by induction. 

We have to prove that 



U n +2 


W„+ 2, 


say. Now 
and so 


x n+2 = bx" +l + cxT, y n+2 = v +1 + cf. 


(10.13.13) u„+2 = bu n+ i + cu n . 

But 

«2m+2 = C m W 2m+ 2, U 2m +\ = C m W 2m + 

Substituting into (10.13.13), and distinguishing the cases of even and odd 
n, we find that 


w 2m+2 — b\V 2m +l + W 2m , W2m+\ — CIW 2 m + W 2m — i. 


Hence w„+ 2 satisfies the same recurrence formulae as p„, and so p n = w n+2 . 
Similarly we prove that q n = w n +\- 

The argument is naturally a little simpler when a = b, c = 1 . In this case 
p n and q„ satisfy 

U n +2 = bu n + 1 + U„ 

and are of the form 

Ax" + Bf, 

where A and B are independent of n and may be determined from the values 
of the first two convergents. We thus find that 


j ^ h +2 y n ~t~ 2 x n + 1 y n ~^~ 1 

Pn = , q n = 


x-y 


x-y 


in agreement with Theorem 178. 
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10.14. The series of Fibonacci and Lucas. In the special case a = 
b = 1 we have 


(10.14.1) 


x — 


75 +1 


2 ’ 


1 75-1 

2 ’ 


j^n-f-2 2 


Pn — U/i+2 — 

The series ( u „ ) or 
(10.14.2) 


75 ’ 


qn = M«+l = 


yf l+l 


75 


1,1,2,3,5,8,13,21,... 


in which the first two terms are u\ and U 2 , and each term after is the sum 
of the two preceding, is usually called Fibonacci’s series. There are, of 
course, similar series with other initial terms, the most interesting being 
the series (v„) or 

(10.14.3) 1,3,4, 7, 11,18,29,47,... 
defined by 

(10.14.4) Vn =x n +y n . 

Such series have been studied in great detail by Lucas and later writers, in 
particular D. H. Lehmer, and have very interesting arithmetical properties. 
We shall come across the series (10.14.3) again in Ch. XV in connexion 
with the Mersenne numbers. 

We note here some arithmetical properties of these series, and particu- 
larly of (10.14.2). 

Theorem 179. The numbers u n and v„ defined by (10.14.2) and 
(10.14.3) have the following properties: 

(i) ( u n , u n+ 1 ) = 1 , (v„, v„ + i) = 1; 

(ii) u n and v n are both odd or both even, and 


(w«, v w ) — 1 , (u n ,v r j+i) — 2 


in these two cases; 

(iii) u n \ur„ for every r; 

(iv) if (m, h) = d then 

(Mm> tl n ) = U(J, 
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and, in particular, u m and u n are coprime ifm and n are coprime; 
(v) if ( m,n ) = 1, then 

u m u n Wmn- 

It is convenient to regard (10.13.12) and (10.14.4) as defining u„ and v n 
for all integral n. Then 

wo = 0, vo = 2 

and 

(10.14.5) u- n = - ( xy)~ n u n = (-1)" -1 u n , v_„ = (-1)" v„. 

We can verify at once that 

(10.14.6) 1u m -\-n — u m v n + u n v m , 

(10.14.7) vj-5i^ = (-l)"4, 

(10.14.8) u\ - u„-\u„+i = (-l)"" 1 , 

(10.14.9) v^- Vw -iv«+i =(-l)"5. 

Proceeding to the proof of the theorem, we observe first that (i) follows 
from the recurrence formulae, or from (10.14.8), (10.14.9), and (10.14.7), 
and (ii) from (10.14.7). 

Next, suppose (iii) true for r = 1, 2, . . . , R — 1. By (10.14.6), 

2 UR n = U w V(rt_l)„ + U(R- l)„V„. 

If u n is odd, then u„\2uR n and so u„\ur„. If u n is even., then v n is even by 
(ii), U(R-\) n by hypothesis, and V(^_i)„ by (ii). Hence we may write 

u Rn = u n ‘ 3 v (/f— l) n 4" U(R-l)n ’ 5V/1, 

and again u„\ur„. 

This proves (iii) for all positive r. The formulae (10. 14.5) then show that 
it is also true for negative r. 

To prove (iv) we observe that, if (m, n) = d, there are integers r,s 
(positive or negative) for which 

rm + sn = d, 

and that 

(10.14.10) 2ud = Urm v sn 4* U sn Vrm > 
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by (10.14.6). Hence, if (u m , u„) = h, we have 

h\Um-h\Un h\Urm'hWsn b\2 Ud. 

If h is odd, h\ud- If h is even, then u m and u n are even, and so 
Urm, Usru v rm, v sn are all even, by (ii) and (iii). We may therefore write 
(10.14.10) as 

Ud — Urm (jYsn) 'I" Usn > 

and it follows as before that h\ud. Thus h\ud in any case. Also Ud\u m , Ud\u„, 
by (iii), and so 

Ud\(u m ,u„) = h. 

Hence 

h = u d , 

which is (iv). 

Finally, if im,n) = 1 , we have 

UmWmm UnWmn 

by (iii), and ( u m , u„) = 1 by (iv). Hence 

u m u n Wmn- 

In particular it follows from (iii) that u m can be prime only when m is 4 
(when U 4 = 3) or an odd prime p. But u p is not necessarily prime: thus 

« 53 = 53316291173 = 953 . 55945741. 

Theorem 180. Every prime p divides some Fibonacci number ( and 
therefore an infinity of the numbers). In particular 

Up- 1=0 (mod p) 


ifp = 5m ± 1, and 

Up+i = 0 (mod p) 

if p = 5m ± 2. 

Since uj, — 2 and us = 5, we may suppose that p ^ 2, p ^ 5. It follows 
from (10.13.12) and (10.14.1) that 

2"- 1 « n = " + (3) 5 + 5 2 + . . ., 


(10.14.11) 
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where the last term is if n is odd and n. 52" -1 if n is even. If n = p 

then 

2 p ~ l = 1, ^ (mod p ) , 

by Theorems 71 and 83; and the binomial coefficients are all divisible by 
p, except the last which is 1 . Hence 


u p = 



= ± 1 (mod p) 


and therefore, by (10. 14.8), * 


u p -\u p +\ = 0 (mod p). 


Also ( p — 1 ,p H- 1) = 2, and so 

(Up-\,U p +\) = M 2 = 1, 

by Theorem 179 (iv). Hence one and only one of u p -\ and u p + 1 is divisible 
by P- 

To distinguish the two cases, take n = p + 1 in (10.14.1 1). Then 
2 p u p +\ = (p + 1) + + 1 ^ 5 + . . . + (p + 1) 52<P _1 ). 


Here all but the first and last coefficients are divisible by pj and so 

2 p u p +i = 1 + (mod p ) . 

Hence u p +\ = 0 (mod p) if = -1, i.e. if p = ±2 (mod 5),* and 

u p -\ = 0 (mod p) in the contrary case. 

We shall give another proof of Theorem 180 in § 15.4. 

^ ^ "v * ) ’ w * lere l.isan integer, by Theorem 73; the numerator contains p, and 

the denominator does not. 

* By Theorem 97. 
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10.15. Approximation by convergents. We conclude this chapter by 
proving some theorems whose importance will become clearer in Ch. XI. 
By Theorem 171, 


Pn 

(In 


X 


< 


1 



so that Pn/gn provides a good approximation to x. The theorem which 
follows shows that p n /q n is the fraction, among all fractions of no greater 
complexity, i.e. all fractions whose denominator does not exceed q„, which 
provides the best approximation. 


Theorem 181. Ifti > l,t 0 < q < q n , andp/q ^ Pn/qn, then 


(10.15.1) 



< 

Z-x 

In 


q 


This is included in a stronger theorem, viz. 

Theorem 182. Ifti > 1,0 < q ^ q n , andp/q ^ Pn/qn then 


(10.15.2) | p n ~ q n x | < | p- qx |. 

We may suppose that (p, q) = 1. Also, by Theorem 171, 


I Pn ~ q n x\ < \p n -\ - q n -\x\. 


and it is sufficient to prove the theorem on the assumption that q „- 1 < q < 
q„, the complete theorem then following by induction. 

Suppose first that q = q„. Then 


Pn _ P_ > J_ 
qn qn qn 


t We state Theorems 181 and 182 for n > 1 in order to avoid a trivial complication. The proof is 
valid for n = 1 unless qi = ^,,+1 = 2, which is possible only if a\ — 02 = 1 . 

In this case 


x = tfQ + 


1 1 1 
T+ l+a 3 + ...’ 


PA. 

<n 


= <*o + 1 , 


and 


ao+2 <^<^0 + 1 


unless the fraction ends at the second 1 . If this is not so then p\ jq\ is nearer to x than any other integer. 
But in the exceptional case x = ao + j there are two integers equidistant from x, and (10.15.1) may 
become an equality. 
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Pn 


X 


1 

^ 

qn4n + 1 



by Theorems 171 and 156; and therefore 


X 

1 

£1 

< 

B--x 

\9n 


In 


which is (10.15.2). 

Next suppose that q n -\ < q < qn, so that p/q is not equal to either of 
p n -\/q n -\ otpn/q n • If we write 


PPn + VPn~l = P, Mn + 1 = q. 


then 
so that 


piPnqn— 1 ~ Pn-iqn) — PQn—l ~~ 4Pn—l’ 


p = ±(pq n _ i -qp n -i); 


and similarly 

v = ±(pq„ - qp n ). 

Hence p and v are integers and neither is zero. 

Since q = pq n + vq n -\ < qn,P and v must have opposite signs. By 
Theorem 171, 


Pn~q n x, p n -\-q n -\x 


have opposite signs. Hence 

P(Pn ~ q n x), v(p n ~ i - q n -\x) 
have the same sign. But 


p-qx = p(p n - q„x) + v(p n - 1 - q„-ix), 


and therefore 


\p - qx\ > \Pn-\ - q n -\x\ > \Pn- q n x\. 


Our next theorem gives a refinement on the inequality (10.9.1) of 
Theorem 171. 
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Theorem 183. Of any two consecutive convergents to x, one at least 
satisfies the inequality 


(10.15.3) 


P 1 

* < rr* 

q 2 q 2 


Since the convergents are alternately less and greater than x, we have 
(10.15.4) Pn±l_Pn _ Pn _ x + _ x 

Qn + 1 4 n 4 n <Jn + 1 


If (10.15.3) were untrue for both p n /q n and p n+ i/q n +i , then (10.15.4) 
would imply 


1 

Pn+iq n —p n q n +l 

' _ Pn+l 

Pn 

q n q n + 1 

q n q n +i 

qn+l 

q n 


(q „+ 1 - q n ) ^ 0, 

which is false except in the special case 


n = 0, a\ = 1, q\ = q Q = 1. 


In this case 


0 < — -*= 1 - 1 


\ 1 < 1 - — < i 

1 + 02 + . ■ . 02 + 1 ^ 2 ’ 


so that the theorem is still true. 

It follows that, when x is irrational, there are an infinity of convergents 
Pnlqn which satisfy (10.15.3). Our last theorem in this chapter shows that 
this inequality is characteristic of convergents. 

Theorem 184. If 

(10.15.5) --* < f \ 

q 2q 2 

then p/q is a convergent. 

If (10.15.5) is true, then 

p €0 

q q 2 
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where 

€ = ±1, 0 < 6 < 

We can express p/q as a finite continued fraction 


[OQt fll) • • ■ > ®b]) 


and since, by Theorem 158, we can make n odd or even at our discretion, 
we may suppose that 

€ = (- 1 )"" 1 . 

We write 

COPn +Pn - 1 

* = ; » 

<oq n + g n - 1 

where p n /q n ,Pn - 1 / q n - 1 are the last and the last but one convergents to the 
continued fraction for p/q. Then 

£0 _Pn _ x _ Pnqn-l ~ Pn-lQn _ (-l)"" 1 

gn ~ gn gn (o)g„ + q»-\) g„ {wq n + q n -\Y 


and so 


Hence 


gn 

vgn+gn - 1 


= 0. 



«!=!>! 

Qn 


(since 0 < 6 < j); and so, by Theorem 172, p n -\/q n -\ and p n /g n are 
consecutive convergents to x. But p n /g n = p/g- 


NOTES 

§ 10. 1 . Many proofs in this and the next chapter are modelled on those given in Perron’s 
Kettenbriiche and Irrationalzahlen; the former contains full references to the early his- 
tory of the subject. There are accounts in English in Cassels, Diophantine approximation , 
Olds, Continued fractions and Wall, Analytic theory of continued fractions (New York, van 
Norstrand, 1948). Stark, Number theory , also gives additional references and material. 

§ 10.12. Theorem 177 is Lagrange’s most famous contribution to the theory. The proof 
given here (Perron, Kettenbriiche, 77) due to Charves. 

§§ 10.13-14. There is a large literature concerned with Fibonacci’s and similar series. 
See Bachmann, Niedere Zahlentheorie, ii, ch. ii; Dickson, History , i, ch. xvii; D. H. Lehmer, 
Annals of Math. (2), 31 (1930), 419-48. 
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APPROXIMATION OF IRRATIONALS BY RATIONALS 

11.1. Statement of the problem. The problem considered in this 
chapter is that of the approximation of a given number £, usually irrational, 
by a rational fraction 



We suppose throughout that 0 < £ < 1 and that p/q is irreducible, t 
Since the rationals are dense in the continuum, there are rationals as 
near as we please to any £. Given £ and any positive number €, there is an 
r = p/q such that 

|r-?l = =Ee; 

any number can be approximated by a rational with any assigned degree of 
accuracy. We ask now how simply or, what is essentially the same thing, 
how rapidly can we approximate to £? Given £ and e, how complex must 
p/q be (i.e. how large q) to secure an approximation with the measure of 
accuracy e? Given £ and q, or some upper bound for q, how small can we 
make e? 

We have already done something to answer these questions. We proved, 
for example, in Ch. Ill (Theorem 36) that, given £ and n, 

P 1 

3p, q.O < q ^ n. --£ ^ — — — , 
q q(n + 1) 

and a fortiori 

(11.1.1) £-| <1; 

q q 2 

and in Ch. X we proved a number of similar theorems by the use of contin- 
ued fractions.* The inequality ( 1 1 . 1 . 1 ), or stronger inequalities of the same 
type, will recur continually throughout this chapter. 

When we consider (11.1.1) more closely, we find at once that we must 
distinguish two cases. 


t 


Except in § 11.12. 


* See Theorems 171 and 183. 
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(1) £ is a rational a/b. If r ^ £, then 


( 11 . 1 . 2 ) 


k-£l 



1 bp - aq\ J_ 
bq ' bq 


so that (11.1.1) involves q < b. There are therefore only a finite number 
of solutions of ( 1 1 . 1 . 1 ). 

(2) £ is irrational. Then there are an infinity of solutions of (11.1.1). 
For, if pjq n is any one of the convergents to the continued fraction to £, 
then, by Theorem 171, 


— 

q n 





and pjq n is a solution. 

Theorem 185. lf% is irrational, then there is an infinity of fractions p/q 
which satisfy (11.1.1). 

In § 1 1 .3 we shall give an alternative proof, independent of the theory 
of continued fractions. 

11.2. Generalities concerning the problem. We can regard our prob- 
lem from two different points of view. We suppose £ irrational. 

(1) We may think first of e. Given £, for what functions 


is it true that 


( 11 . 2 . 1 ) 


3/>, q . q < <D . 


P 

q 


$ 






for the given £ and every positive e? Or for what functions 



independent of £, is (11.2.1) true for every £ and every positive el It is 
plain that any <I> with these properties must tend to infinity when € tends 
to zero, but the more slowly it does so the better. 
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There are certainly some functions <t> which have the properties required. 
Thus we may take 


<t> = 



1 , 


and q = 4>. There is then a p for which 


P 

q 


$ 



< 


and so this 4> satisfies our requirements. The problem remains of finding, 
if possible, more advantageous forms of d>. 

(2) We may think first of q. Given £, for what functions 


0 = <KS, q ), 


tending to infinity with q , is it true that 


( 11 . 2 . 2 ) 





Or for what functions 0 = <p(q) independent of £, is (11.2.2) true for 
every £? Here, naturally, the larger 0 the better. If we put the question 
in its second and stronger form, it is substantially the same as the second 
form of question (1). If 0 is the function inverse to 4>, it is substantially 
the same thing to assert that (1 1.2.1) is true (with d> independent of £) or 
that (1 1.2.2) is true for all £ and q. 

These questions, however, are not the questions most interesting to us 
now. We are not so much interested in approximations to £ with an arbitrary 
denominator q, as in approximations with an appropriately selected q. For 
example, there is no great interest in approximations to jt with denominator 
1 1 ; what is interesting is that two particular denominators, 7 and 113, give 
the very striking approximations ^ and |||. We should ask, not how 
closely we can approximate to £ with an arbitrary q, but how closely we 
can approximate for an infinity of values of q. 

We shall therefore be occupied, throughout the rest of this chapter, with 
the following problem: for what <j> = 0(£, q), or 0 = <p(q), is it true, for a 
given £, or for all £, or for all £ of some interesting class, that 


P 

q 


* 



(11.2.3) 
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for an infinity of q and appropriate p? We know already, after Theorem 
171, that we can take <p = q 2 for all irrational £. 

11.3. An argument of Dirichlet. In this section we prove Theorem 185 
by a method independent of the theory of continued fractions. The method 
gives nothing new, but is of great importance because it can be extended 
to multi-dimensional problems.^ 

We have already defined [x], the greatest integer in x. We define (x) by 

(x)=x- [x]; 


and x as the difference between x and the nearest integer, with the 
convention that x = j when* is n + Thus 



Suppose £ and € given. Then the Q+l numbers 


0, (£), (2£), . . . , (Q$) 

define Q+l points distributed among the Q intervals or ‘boxes’ 
s s + 1 

q < -"g-Cs = 0, 1,...,0- 1). 

There must be one box which contains at least two points, and therefore 
two numbers q\ and q 2 , not greater than Q , such that (<71 £) and (<72 £) differ 
by less than 1 IQ. If <72 is the greater, and q = qi — q\, then 0 < q ^ Q 
and |<7£| < l/Q. There is therefore a p such that 

1 


Hence, taking 


we obtain 


I <lS ~P\ < Q- 




3p,q.q ^ 


m+i. 


J 

q 


€ 

< — 

<1 


t See § 11.12. 
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(which is nearly the same as the result of Theorem 36) and 


(11.3.1) 


P 

q 




which is (11.1.1). 

If £ is rational, then there is only a finite number of solutions.* We have 
to prove that there is an infinity when £ is irrational. Suppose that 


P\ Pi Pk 

9 9 ' * * 9 

qi qi qk 

exhaust the solutions. Since £ is irrational, there is a Q such that 




(s= 1,2,...,*). 


But then the p/q of (1 1.3. 1) satisfies 





and is not one of p s lq s \ a contradiction. Hence the number of solutions of 
(11.1.1) is infinite. 

Dirichlet’s argument proves that q% is nearly an integer, so that (q%) is nearly 0 or 1, but 
does not distinguish between these cases. The argument of § 11.1 gives rather more: for 

Pn = (-1)”" 1 
qn Wn+l 

is positive or negative according as n is odd or even, and q n % is alternately a little less and 
a little greater than p n . 

11.4. Orders of approximation. We shall say that £ is approximate 
by rationals to order n if there is a AT(£), depending only on £, for which 


(11.4.1) 



m) 
< 

qn 


has an infinity of solutions. 

We can dismiss the trivial case in which £ is rational. If we look back 
at (11.1.2), and observe that the equation bp — aq = 1 has an infinity of 


^ The proof of this in § 1 1 . 1 was independent of continued fractions. 
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solutions, we obtain 

Theorem 186. A rational is approximate to order 1, and to no higher 
order. 

We may therefore suppose £ irrational. After Theorem 171, we have 

Theorem 1 87. Any irrational is approximate to order 2. 

We can go farther when £ is a quadratic surd (i.e. the root of a quadratic 
equation with integral coefficients). We shall sometimes describe such a £ 
as a quadratic irrational, or simply as ‘quadratic’. 

Theorem 188. A quadratic irrational is approximate to order 2 and to 
no higher order. 

The continued fraction for a quadratic £ is periodic, by Theorem 177. In 
particular its quotients are bounded, so that 

0 < a n < M , 

where M depends only on £. Hence, by (10.5.2), 


q'n + 1 = a' n +\qn + qn-\ < (a n + 1 + l)?n— 1 < (M + 2)q„ 


and a fortiori q n+ \ < (M+2)q n . Similarly q n < (M+2)q n -\. 

Suppose now that q n -\ < q ^ q n . Then q n < (M+2)q and, by 
Theorem 181, 



1 1 1 * 
qnq' n+ 1 > (A/ + 2 )q 2 n > (M -]- 2) 3 ^_J > q 2 ’ 


where K = (M+ 2) -3 ; and this proves the theorem. 

The negative half of Theorem 188 is a special case of a theorem 
(Theorem 191) which we shall prove in § 11.7 without the use of con- 
tinued fractions. This requires some preliminary explanations and some 
new definitions. 


11.5. Algebraic and transcendental numbers. An algebraic number 
is a number x which satisfies an algebraic equation, i.e. an equation 

(11.5.1) aox n + a\x n ~ 1 1- a n = 0, 


where oq, a\,... are integers, not all zero. 

A number which is not algebraic is called transcendental. 
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If x = a/b, then bx — a = 0, so that any rational x is algebraic. Any 
quadratic surd is algebraic; thus i = ^/(— 1) is algebraic. But in this chapter 
we are concerned with real algebraic numbers. 

An algebraic number satisfies any number of algebraic equations of 
different degrees; thus x = satisfies x 2 — 2 = 0, x 4 —4 — 0,.... If x 
satisfies an algebraic equation of degree n, but none of lower degree, then 
we say that x is of degree n. Thus a rational is of degree 1 . 

A number is Euclidean if it measures a length which can be constructed, 
starting from a given unit length, by a Euclidean construction, i.e. a finite 
construction with ruler and compasses only. Thus *J2 is Euclidean. It is 
plain that we can construct any finite combination of real quadratic surds, 
such as 

(11.5.2) V(11+2V7)~ Vdl-2 V7) 

by Euclidean methods. We may describe such a number as of real quadratic 
type. 

Conversely, any Euclidean construction depends upon a series of points 
defined as intersections of lines and circles. The coordinates of each point 
in turn are defined by two equations of the types 

lx + my + n = 0 

or x 2 +y2 +2gx + 2# + c = 0, 

where /. m, n, g,f c are measures of lengths already constructed; and two 
such equations define x and y as real quadratic combinations of /,m,.... 
Hence every Euclidean number is of real quadratic type. 

The number ( 1 1 .5.2) is defined by 

x—y-z, y 2 = 11+2/, z 2 = 11-2/, t 2 = 7 
and we obtain * 4 * 44x 2 + 112 = 0 

on eliminating y,z, and t. Thus x is algebraic. It is not difficult to prove 
that any Euclidean number is algebraic, but the proof demands a little 
knowledge of the general theory of algebraic numbers.* 


f In fact any number defined by an equation oro-t' 1 + ai*" - 1 H 1- a„ = 0, where a 0 , aj ,. . ., a„ 

are algebraic, is algebraic. For the proof see Hecke 66, or Hardy, Pure mathematics (ed. 9, 1944), 39. 
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11.6. The existence of transcendental numbers. It is not immediately 
obvious that there are any transcendental numbers, though actually, as we 
shall see in a moment, almost all real numbers are transcendental. 

We may distinguish three different problems. The first is that of proving 
the existence of transcendental numbers (without necessarily producing a 
specimen). The second is that of giving an example of a transcendental 
number by a construction specially designed for the purpose. The third, 
which is much more difficult, is that of proving that some number given 
independently, some one of the ‘natural’ numbers of analysis, such as e or 
71, is transcendental. 

We may define the rank of the equation (1 1.5.1) as 
N = n + |ao| + \a\ | + • • • + \a n \. 

The minimum value of N is 2. It is plain that there are only a finite number 
of equations 

En,\, En,2, • • • > EnJcn 

of rank N. We can arrange the equations in the sequence 

■^2,1 » £2,2 > • • • » E'j.Jei') £3,1 ? £3,2* • • • , E^fa, £4,1 , • . . 

and so correlate them with the numbers 1, 2, 3,.... Hence the aggregate of 
equations is enumerable. But every algebraic number corresponds to at least 
one of these equations, and the number of algebraic numbers corresponding 
to any equation is finite. Hence 

Theorem 189. The aggregate of algebraic numbers is enumerable. 

In particular, the aggregate of real algebraic numbers has measure zero. 

Theorem 190. Almost all real numbers are transcendental. 

Cantor, who had not the more modem concept of measure, arranged his proof of the 
existence of transcendental numbers differently. After Theorem 1 89, it is enough to prove 
that the continuum 0 ^ x < 1 is not enumerable. We represent x by its decimal 

x = *0102^3 • • • 

(9 being excluded, as in § 9. 1). Suppose that the continuum is enumerable, as x \ , * 2 , * 3 ,. . ., 
and let 


x\ - a\iai 2 a\3 - • • 
*2 — 'Q2l a 22 a 23' • • 
X3 = ^31^32^33- • • 
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If now we define a n by 


a„ =a„„ + 1 (if a„„ is neither 8 nor 9), 

a„ = 0 (if a„„ is 8 or 9), 

then a n ^ a nn for any n ; and x cannot be any of x \ , xj ,. . since its decimal differs from 
that of any x n in the nth digit. This is a contradiction. 

11.7. Llouville’s theorem and the construction of transcendental 
numbers. Liouville proved a theorem which enables us to produce as 
many examples of transcendental numbers as we please. It is the gen- 
eralization to algebraic numbers of any degree of the negative half of 
Theorem 188. 

Theorem 191. A real algebraic number of degree n is not approximable 
to any order greater than n. 

An algebraic number £ satisfies an equation 

/(£) = 1 + • • • + On = 0 

with integral coefficients. There is a number Af(£) such that 

(11.7.1) \f\x)\<M ($-l<*<f + l>. 

Suppose now that p/q ^ £ is an approximation to £. We may assume the 
approximation close enough to ensure that p/q lies in (£— 1, £+1), and is 
nearer to £ than any other root of f Qt) = 0, so that / (p/q) f 0. Then 

\aoP^_ + 1 J_ 

q n ^ q"’ 

since the numerator is a positive integer; and 

(n.7.3) = 

where x lies between p/q and £ . It follows from ( 1 1 .7.2) and (11 .7.3) that 

\f(p/q)\ 1 _ K 

l/'WI > Mq" q‘ *' 

so that £ is not approximable to any order higher than n. 

The cases n= 1 and n = 2 are covered by Theorems 1 86 and 1 88. These 
theorems, of course, included a positive as well as a negative statement. 



(11.7.2) 
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(a) Suppose, for example, that 

£ = 110001000 . . . = 10 ‘ 1! + 10 ~ 2! + 10 _3! + . . . , 
that n > N, and that £„ is the sum of the first n terms of the series. Then 


say. Also 


t = 

10* ! q 


0<£ - - = £ - £„ = 10 -( ' ,+1)! + 10“ (w+2)! + • • • < 2A0- (n+1)l <2q- N . 
< 1 

Hence £ is not an algebraic number of degree less than N. Since N is 
arbitrary, £ is transcendental. 

(b) Suppose that 


£ = 


l 


1 


1 


that n > N, and that 


10+10 2! + io 3! + ...’ 

P ^Pn 

q <ln 

the nth convergent to £. Then 

1 


q 


i 


qnq , n+ i a »+\q% ^w+i 


Nowa w+ i = lO^” 4 " 1 ^ and 


. i qn + 1 . q »— i . , , .. . . 

q\ < a\ + 1, — = a n+ 1 + — < a n+ i + 1 (n > 1) ; 






so that 


q n < (ai + 1) (a 2 + 1) • • • (a„ + 1) 

< ( l + ^) (‘ + Tp) " (‘ + T^) 


a\a2-a n 

< 2a\ a 2 ■ ■ a n = 2.10 1!+ +w! < 10 2(w!) = aj. 


p —t < 


1 


1 


1 


1 


1 


a " +1 a ” +1 < aj < i- 


We conclude, as before, that £ is transcendental. 
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Theorem 192. The numbers 

£ = 10 -1! + 10 _2! + 10 _3! + ••• 

and 

* = L_ 

5 10 1! + l(P'-+ 10 3! +-- 


are transcendental. 

It is plain that we could replace 10 by other integers, and vary the con- 
struction in many other ways. The general principle of the construction is 
simply that a number defined by a sufficiently rapid sequence of rational 
approximations is necessarily transcendental. It is the simplest irrationals, 
such as y/2 or \ (y/5 — l), which are the least rapidly approximable. 

It is much more difficult to prove that a number given ‘naturally’ is 
transcendental. We shall prove e and n transcendental in §§ 11.13-14. 
Few classes of transcendental numbers are known even now. These classes 
include, for example, the numbers 

e, n, sin 1 ,J 0 (1) , log 2, e n , 2^ 2 

log 2 

but not 2 e , 2 n , n e , or Euler’s constant y. It has never been proved even 
that any of these last numbers are irrational. 

11.8. The measure of the closest approximations to an arbitrary 
irrational. We know that every irrational has an infinity of approximations 
satisfying (11.1.1), and indeed, after Theorem 183 of Ch. X, of rather 
better approximations. We know also that an algebraic number, which 
is an irrational of a comparatively simple type, cannot be ‘too rapidly’ 
approximable, while the transcendental numbers of Theorem 192 have 
approximations of abnormal rapidity. 

The best approximations to £ are given, after Theorem 181, by the 
convergents p n !q n of the continued fraction for £; and 

1 1 
Qn<l n + 1 a n+\q£ 

so that we get a particularly good approximation when a„+\ is large. 
It is plain that, to put the matter roughly, £ will or will not be rapidly 
approximable according as its continued fraction does or does not contain 
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a sequence of rapidly increasing quotients. The second £ of Theorem 192, 
whose quotients increase with great rapidity, is a particularly instructive 
example. 

One may say, again very roughly, that the structure of the continued 
fraction for £ affords a measure of the ‘simplicity’ or ‘complexity’ of £. 
Thus the second £ of Theorem 192 is a ‘complicated’ number. On the other 
hand, if a„ behaves regularly, and does not become too large, then £ may 
reasonably be regarded as a ‘simple’ number; and in this case the rational 
approximations to £ cannot be too good. From the point of view of rational 
approximation, the simplest numbers are the worst. 

The ‘simplest’ of all irrationals, from this point of view, is the number 


( 11 . 8 . 1 ) 


? = i(V5-l) = 


1 1 1 

1+T+1 + -.-’ 


in which every a n has the smallest possible value. The convergents to this 
fraction are 


so that q n -\ = p n and 
Hence 


0 112 3 5 
T 1’ 2’ 3’ 5’ 8’"' 
fri-l _Pn ^ 

<in tin 


Pn 

tin 



1 

Qrf'n+l 


1 


q n {(1 + $)q n +q n - 1 ) 

-1 


q 2 n( 


l+£ + 


<?w-l 

q n 


) 


1 1 _ 1 
4n 1 + 2 £ qW 5 ’ 


when n — > oo. 

These considerations suggest the truth of the following theorem. 


Theorem 193. Any irrational £ has an infinity of approximations which 
satisfy 


( 11 . 8 . 2 ) 



< 


1 

<?V 5 ‘ 


The proof of this theorem requires some further analysis of the approx- 
imations given by the convergents to the continued fraction. This we give 
in the next section, but we prove first a complement to the theorem which 
shows that it is in a certain sense a ‘best possible’ theorem. 
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Theorem 194. In Theorem 193, the number y/5 is the best possible num- 
ber: the theorem would become false if any larger number were substituted 
for y/ 5. 

It is enough to show that, if A > y/5, and £ is the particular number 
(11.8.1), then the inequality 


P 

d 


$ 


1 

Aq 2 


has only a finite number of solutions. 

Suppose the contrary. Then there are infinitely many q and p such that 


* = - + 4 




Hence 


s t s 1 ,, 1 

^j-iv'5=Q?+^ - =p 2 +pq-g 2 - 


The left-hand side is numerically less than 1 when q is large, while the 
right-hand side is integral. Hence p 2 + pq — q 2 = 0 or (2 p 4- q) 2 = 5q 2 , 
which is plainly impossible. 


11.9. Another theorem concerning the convergents to a continued 
fraction. Our main object in this section is to prove 

Theorem 195. Of any three consecutive convergents to i;, one at least 
satisfies (11.8.2). 

This theorem should be compared with Theorem 183 of Ch. X. 

We write 

(11-9.1) =b 

<ln 

Then 


Pn 

q» 


-s 


1 

qnq'n+l 


and it is enough to prove that 


1 1 
qn ^n+l ^n+1 


(11.9.2) a' i + b i ^j5 

cannot be true for the three values n— 1 , n, n+\ of i. 
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Suppose that (11 .9.2) is true for i = n— 1 and / = n. We have 

, 1 
a n-\ — a n-\ H T 

a n 

and 

(11.9.3) ~ ^ +£»„_i. 

Hence 

— 7 + 7“ = a«-i + ^»-l ^ V 5 » 
and 

1 - a.'-^ < (V5 - 6„) (^5 - r ) 
a n \ b n / 

or 

b n + T“ ^ -s/5- 

On 

Equality is excluded, since b„ is rational, and b n < 1 . Hence 

b 2 n - bn</5 + 1 < 0, Qv 5 -^ < ^» 

(11.9.4) b„> ^(V5-l). 

If (11.9.2) were true also for / = n + 1, we could prove similarly that 

(11.9.5) Aa+i > \ ys - 1) ; 
and (1 1 .9.3), + (1 1 .9.4), and (11 .9.5) would give 

a n = ~ b n < ^ (V 5 + 0 ~ \ (V5 - 1) = 1, 

a contradiction. This proves Theorem 195, and Theorem 193 is a corollary. 


t With n + 1 for n. 
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11.10. Continued fractions with bounded quotients. The number -J5 
has a special status, in Theorems 193 and 195, which depends upon the 
particular properties of the number (11.8.1). For this £, every a n is 1; for 
a £ equivalent to this one, in the sense of § 10 . 1 1 , every a n from a certain 
point is 1; but, for any other £, a n is at least 2 for infinitely many n. It is 
natural to suppose that, if we excluded £ equivalent to ( 1 1 . 8 . 1 ), the ^J5 of 
Theorem 193 could be replaced by some larger number; and this is actually 
true. Any irrational £ not equivalent to (1 1.8.1) has an infinity of rational 
approximations for which 


1 

< IqKjl' 

There are other numbers besides f5 and which play a special part in 
problems of this character, but we cannot discuss these problems further 
here. 

If a n is not bounded, i.e. if 
( 11 . 10 . 1 ) lim a n = oo, 

n->oo 

then q' n+ fq„ assumes arbitrarily large values, and 



( 11 . 10 . 2 ) 



6 


< 


q 


2 


for every positive € and an infinity of p and q. Our next theorem shows 
that this is the general case, since ( 1 1 . 10 . 1 ) is true for ‘almost all’ £ in the 
sense of § 9.10. 


Theorem 196. a„ is unbounded for almost all £; the set of £ for which 
a„ is bounded is null. 


We may confine our attention to £ of (0,1), so that ao = 0 , and to irra- 
tional £, since the set of rationals is null. It is enough to show that the set 
Fk of irrational £ for which 


(11.10.3) a n ^k 

is null; for the set for which a„ is bounded is the sum of F \ , F 2 , F 3 ,. . .. 
We denote by 
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the set of irrational £ for which the first n quotients have given values ai , 
02 ,..., a„. The set E ai lies in the interval 

1 1 
oi + l’oi’ 

which we call 7 a , . The set E aua2 hes in 

1 J_ 1 1 

fll+ 02 ’ Ol + 02 + 1 ’ 

which we call 7 ai>fl2 . Generally, E au a 2 ,...,a„ lies in the interval I au a 2 ,...,a„ 
whose end points are 

[o \ , 02, . • . , 0/7 — 1 , Ojj “I - 1 ], [Ol , 02, • • • , Off — 1 , Of j] 

(the first being the left-hand end point when n is odd). The intervals cor- 
responding to different sets a\ , 02 , . . . , a n are mutually exclusive (except 
that they may have end points in common), the choice of a v +i dividing up 
Ia\,a 2 ,...,a v into exclusive intervals. Thus I a \,a 2 ,...,a„ is the sum of 

ffl 1 j 0 2 , On » t / ffl 1 1 0 2 , , On 9 2 > • • • * 

The end points of / ai , a2 , ..., a „> can also be expressed as 

( 0/7 + !)/>/,-! +Pn - 2 a n Pn-\ + Pn-2 . 

(O/i + l)^/7-l + qn-l' O/J^H-I + q n _ 2 ’ 

and its length (for which we use the same symbol as for the interval) is 

1 1 

{(a„ + l)q„-\ + q n - 2 )(a n qn-\ + q n - 2 ) ( q n + qn-l)q n 

Thus 

Iax = (01 + l)oi ‘ 

We denote by 

Ea\,a 2 ,...,a„;k 

the sub-set of E axA1 ^ ^ Qn for which o„ + i ^ k. The set is the sum of 

^' a \,a 2 , ...,On,On+l (O/ 7 +I 1,2, ..., k ). 
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The last set lies in the interval I a \,a 2 , ...,a„,a„+\, whose end points are 

[<?! , (Z2> • • • > &n+\ “I” 1 ]» [^1 5 a h * ■ • > ®/i+l ]> 

and so E au a 2 , ...,a„-,k lies in the interval I au a 2 , whose end points are 
[a\,a 2 , ...,a„,k+ 1], [ai,U2, 1], 


(Jc + 1 )£n + Pn - 1 Pn +Pn~ 1 

(£ "I" 1)<7/i "P <ln— 1 “1“ 1 

The length of I au a 2 ,...,a n -,k is 

k 

{(£ + \)q n +q„-i}(q n + qn-\)’ 


and 


(11.10.4) 


Ia\,a 2 ,...,an,k 

Ia\,a 2 ,...,a n 


tyr, k 

(k + 1 )q n + q n -\ k + 1 


for all a\,a 2 ,..., a n . 
Finally, we denote by 


An) _ 
*k ~ 



l a\,a 2 . 


ai^k,...,a„^k 


On 


the sum of the I ai ,...,a„ for which a\ ^ k, ... ,a„ < k; and by Ff the set of 
irrational £ for which a\ ^ k , . . . , a n ^ k. Plainly Ff is included in if 
First, if is the sum of 7 a , for a\ = 1,2 ,...,k, and 


k i 

/<» - V 1 

k " a\(a\ + 1) 
01 = 1 


= 1 - 


1 k 

k + 1 k + 1 


Generally, lf n+l * is the sum of the parts of the I a \,a 2 , ...,a n , included in iff , 
for which a n +\ ^ k, i.e. is 



'01,02, 


a\^k,...,a„^k 


a n ;k- 
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/(«+!) 

l k 


< 


k 

k + 1 





a\^k,...,a„^k 


Qrt 


k An). 

TT\ Jk ’ 


and so 


r 11 < 



It follows that F ^ can be included in a set of intervals of length less 
than 



which tends to zero when n — ► oo. Since Fk is part of F^ for every n, the 
theorem follows. 

It is possible to prove a good deal more by the same kind of argument. 
Thus Borel and F. Bernstein proved 


Theorem 197*. is an increasing Junction of n for which 


(11.10.5) 


T — 

(f>(n) 


is divergent, then the set of £ for which 


( 11 . 10 . 6 ) 


a n ^ 4>(n), 


for all sufficiently large n, is null. On the other hand, if 


(11.10.7) 



1 

0 («) 


is convergent, then (11.1 0.6) is true for almost all £ and sufficiently large n. 

Theorem 196 is the special case of this theorem in which <f>(n) is 
a constant. The proof of the general theorem is naturally a little more 
complex, but does not involve any essentially new idea. 
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11.11. Further theorems concerning approximation. Let us suppose, to fix our ideas, 
that a n tends steadily, fairly regularly, and not too rapidly, to infinity. Then 


Pn 

<In 


1 


1 


where 


<lnq ' n+ 1 a n +\ql qnXiqnY 


X(qn) = a n+ \q n 


There is a certain correspondence between the behaviour, in respect of convergence or 
divergence, of the series^ 


y- \ y^ Qn t 

vx(v)’ 


and the latter series is 


T — ■ 

a n +l 

These rough considerations suggest that, if we compare the inequalities 

(11.11.1) a n < <p(n) 

and 


( 11 . 11 . 2 ) 


q 


i 


qx(q)’ 

there should be a certain correspondence between conditions on the two series 

JL y-_L 

4>{n)' ^ x(qY 

And the theorems of § 11.10 then suggest the two which follow. 

Theorem 198. If 

V — 

^ x(q) 

is convergent, then the set of £ which satisfy (11.11.2 ) for an infinity ofq is null. 
Theorem 199*. If xiqVq increases with q, and 

V — 

^ X{q) 

is divergent , then (11.11.2) is true , for an infinity of q, for almost all £. 


t The idea is that underlying ‘Cauchy’s condensation test’ for the convergence or divergence of a 
series of decreasing positive terms. See Hardy, Pure mathematics, 9th ed., 354. 
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Theorem 199 is difficult. But Theorem 198 is very easy, and can be proved without 
continued fractions. It shows, roughly, that most irrationals cannot be approximated by 
rationals with an error of order much less than q~ 2 , e.g. with an error 


O 


{? 2 (log?) 2 } 


q 2 (logq) 2 

The more difficult theorem shows that approximation to such orders as 


°( 


1 


q 2 log q log log q 


)■- 


is usually possible. 

We may suppose 0 < £ < 1 . We enclose every plq for which q ^ N in an interval 


P 

q 


qx(q) 


P 

q 


qx(q) 


There are less than q values of p corresponding to a given q , and the total length of the 
intervals is less (even without allowance for overlapping) than 


2 E 


1 


N 


x(q)’ 


which tends to 0 when N -*■ oo. Any £ which has the property is included in an interval, 
whatever be N, and the set of £ can therefore be included in a set of intervals whose total 
length is as small as we please. 

11.12. Simultaneous approximation. So far we have been concerned 
with approximations to a single irrational £ . Dirichlet’s argument of § 1 1.3 
has an important application to a multi-dimensional problem, that of the 
simultaneous approximation of k numbers 


£i» Hi 


by fractions 

P\_ Pi Pk_ 

9 9 • • * 9 

with the same denominator q (but not necessarily irreducible). 

Theorem 200. If H\> Hl> ■ ■ • iHk are any real numbers, then the system of 
inequalities 


Pi_ 


Hi 


< 


1 


ql+n 



( 11 . 12 . 1 ) 
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has at least one solution. If one £ at least is irrational, then it has an infinity 
of solutions. 

We may plainly suppose that 0 < £/ < 1 for every i. We consider the k- 
dimensional ‘cube’ defined by 0 ^ x, < 1, and divide it into ‘boxes’ by 
drawing ‘planes’ parallel to its faces at distances 1 IQ. Of the Q*+l points 

d$ i), (/&),• ..,(%) (/ = 0,1,2,. ..,0*), 

some two, corresponding say to / — q\ and l = q2 > q i, must lie in the 
same box. Hence, taking q = qi — q \ , as in § 1 1 .3, there is a q ^ such 
that 




< 


i i 

— ^ — 
Q 


for every i. 

The proof may be completed as before; if a £, say £,-, is irrational, then 
£, may be substituted for £ in the final argument of § 1 1 .3. 

In particular we have 

Theorem 201. Given £i, £ 2 ,..., £* a/u/ any positive e, we can find an 
integer q so that q& differs from an integer, for every i, by less than €. 

11.13. The transcendence of e. We conclude this chapter by proving 
that e and n are transcendental. 

Our work will be considerably simplified by the introduction of a symbol 
h r , which we define by 

h° =1, h r = r\ (r ^ 1). 

If f (x) is any polynomial in x of degree m, say 

m 

fix) = ^ c r x\ 

r = 0 


m m 

H c rh r = c r r! 


r=0 r = 0 


then we define / (h) as 
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(where 0! is to be interpreted as 1). Finally we define fix + h) in the 
manner suggested by Taylor's theorem, viz. as 


= E/ W w- 


r= 0 


r=0 


Iff (x + y) = F(y), then/(x + h) = F(h). 
We define u r (x) and e r (x), for r = 0, 1, 2, 


U r (x) = 


+ 


X 2 


r+l (r+l)(r + 2) 
It is obvious that \u r (x)\ < e^, and so 
(11.13.1) krOOl < 1, 


by 

H = e M € r (x). 


for all x. 

We require two lemmas. 

Theorem 202. If <f>(x) is any polynomial and 

S S 

(11.13.2) <p(x) = ^2 CrX r , rfr(x) = ^ c r € r (x)x r , 

r = 0 r—0 

then 

(11.13.3) e*<t>(h) = <p(x + h) + rj/(x)e^. 

By our definitions above we have 

(x + h) r = h r + rxh r ~ l + r - ^ 2 - x 2 h r ~ 2 + ---+x r 

= r! 4- r(r - l)lxr + — 2)\x 2 H \-x r 

-('*■ 

= r!e* — u r (x)x r — e*h r — u r (x)x r . 

Hence 


= (x + h) r + u r (x)x r = (x 4- h) r + e M € r (x)x r . 
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Multiplying this throughout by c r , and summing, we obtain (1 1.13.3). 

As in § 7.2, we call a polynomial in x, or in x,y , . . . , whose coefficients 
are integers, an integral polynomial in x, or x,y, 

Theorem 203. If m ^ 2,f(x) is an integral polynomial in x, and 

!>/<*>• 

then F\ ( [h ), F 2 (h) are integers and 

F\(h) =/(0), F 2 (h) = 0 (mod m). 

Suppose that 

L 

fix) = a ix l > 

1=0 


where ao,. . ., az, are integers. Then 


^ v /+m— 1 

F,M = 5Z a '7 FTT’ 

< m -» ! 


and so 


Fi{h) = Y2°l 


1=0 


(l + m- 1)! 
(m — 1)! • 


But 


(l + m — 1)! 
(m — 1)! 


— (I + m — 1)(/ + m — 2) • • • m 


is an integral multiple of m if / ^ 1 ; and therefore 


F\ (hi) = ao —f (0) (mod m). 
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Similarly 


L x l+m 




/=o 


(m — 1)!’ 


^2 (A) = 55 0 (mod m )- 

^ (m — 1 )! 


/=o 


We are now in a position to prove the first of our two main theorems, 
namely 

Theorem 204. e is transcendental. 

If the theorem is not true, then 


(11.13.4) J2 C ‘ e ‘ = °, 

1=0 

where n ^ 1, Co, C 1,..., C„ are integers, and Co 7^ 0. 

We suppose that p is a prime greater than max(n, (Col), and define 
0(x) by 

4>(x)= ? - 1, , {(x-l)(x- 2 )...(x-n)F. 

(P- 1)! 

Ultimately, p will be large. If we multiply (11.13.4) by and use 
(11.13.3), we obtain 


n n 

C t <Ht + h)+ C t tfr(t)e t 


1= 0 


1=0 


= 0 , 


or 

(11.13.5) Si+S 2 = 0, 

say. 

By Theorem 203, with m =p, <p(Ji) is an integer and 


0(A) = (-1) /W W (mod p). 
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Again, if 1 ^ ^ n. 


<t>(t + x ) = 


( t+xY ’~ l 
ip- D! 


{(x 4 - 1 - 1) . . .x(x - 1) . . . (x + t - n)} p 


xP 

ip- D! 


fix). 


where/ (x) is an integral polynomial in x. It follows (again from Theorem 
203) that 4>{t + h) is an integer divisible by p. Hence 


Si = y, C t <t> it + h) = (- 1/" 1 Co in'y # 0 (mod p) , 

1=0 


since Co / 0 and p > max(n, |Col). Thus Si is an integer, not zero; and 
therefore 

(11.13.6) 

On the other hand, \€ r ix) \ < 1 , by (1 1 . 1 3. 1), and so 

5 

hK0l<X>|/ r 

r= 0 
/P-1 

^7 — {(/+l)(f + 2) •••(* + «)}'-► 0, 

(P- 1)! 

when p -*■ oo. Hence S 2 ->• 0, and we can make 

(11.13.7) |S 2 | < X - 

by choosing a sufficiently large value of p. The formulae (11.13.5), 
(11.13.6), and ( 1 1 . 1 3 .7) are in contradiction. Hence ( 1 1 . 1 3 .4) is impossible 
and e is transcendental. 

The proof which precedes is a good deal more sophisticated than the 
simple proof of the irrationality of e given in § 4.7, but the ideas which 
underlie it are essentially the same. We use (i) the exponential series and 
(ii) the theorem that an integer whose modulus is less than 1 must be 0. 
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11.14. The transcendence of n. Finally we prove that n is transcen- 
dental. It is this theorem which settles the problem of the ‘quadrature of 
the circle’. 

Theorem 205. n is transcendental. 

The proof is very similar to that of Theorem 204, but there are one or 
two slight additional complications. 

Suppose that fi\, (h,---, Pm are the roots of an equation 

dx m + d x x m ~ x + • • • + d m = 0 

with integral coefficients. Any symmetrical integral polynomial in 

dP\,dP2 , . . .,dp m 

is an integral polynomial in 

d \ , d2 , . . . , dm , 


and is therefore an integer. 

Now let us suppose that n is algebraic. Then in is algebraic,* and 
therefore the root of an equation 

dx™ + d\x^~ x H \-d m = 0, 

where m ^ 1, d,d\,..., d m are integers, and d ^ 0. If the roots of this 
equation are 


^1 5 ^2) • • • 9 5 

then l+e* 0 — \+e 171 = 0 for some co, and therefore 

(1 +e Wi )(l+e W2 )...(l 4- e Wm ) = 0. 

* Ifao*" + a\x n ~ x H Ya n = 0 and y = ix, then 

aft/ 1 - t>2y n ~ 2 4 1- »(ai/ ,_1 - a3 y>- 3 H ) = 0 

and so 

(aov" ~ azy" -2 H ) 2 + (ai/ 1-1 - ajy" -3 H ) 2 = 0. 
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Multiplying this out, we obtain 


2 m — 1 

(11.14.1) l+^<?“'=0, 

i= l 

where 

(11.14.2) ai,a2, . . . ,a2'"-i 
are the 2 m — 1 numbers 

0)\, . . . ,(0 m ,(0\ + (02, CO\ + (OJ, . . . ,CO\ + tt>2 + • • • + 

in some order. 

Let us suppose that C— 1 of the a are zero and that the remaining 

n = 2 m - 1 - (C - 1) 

are not zero; and that the non-zero a are arranged first, so that (1 1. 14.2) 
reads 


ffi , . . . , otfi, 0, 0, . . . , 0. 

Then it is clear that any symmetrical integral polynomial in 

(11.14.3) dal,..., dan 

is a symmetrical integral polynomial in 

da i, . . .,da„, 0, 0, . . . ,0, 


i.e. in 


da\, da 2 , . . . , da 2 ”>- \. 

Hence any such function is a symmetrical integral polynomial in 

d(o \,d( 02 ,...,d(o m , 

and so an integer. 

We can write (1 1 . 14.1) as 


C + ^e*' =0. 

r=i 


(11.14.4) 
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We choose a prime p such that 

(11.14.5) p > max(</, C, \d n a\ • • -a„|) 
and define <f>( x) by 

jnp+p-\ x p - 1 

(11.14.6) 0 (x) = — — — {(x - ai) (x - a 2 ) • • • (x - a „)) p . 

KP ~ !)• 

Multiplying (11.14.4) by <f>(h), and using (11.13.3), we obtain 


(11.14.7) 

So Si + ^2 =0, 

where 


(11.14.8) 

5 0 = C</>(h), 

(11.14.9) 

n 

S\ = Yl^ at+h ^ 


t= i 

(11.14.10) 

n 


/=1 

Now 

xP-\ n P 


where g/ is a symmetric integral polynomial in the numbers (11 .14.3), and 
so an integer. It follows from Theorem 203 that <p(h) is an integer, and that 

(11.14.11) <t>(h) =go = (— 1 y n d p ~ x ( da\.dct2 </a„) p (mod p ) . 

Hence So is an integer; and 

(11.14.12) . So = Cg 0 # 0 (mod p ) , 

because of (11.14.5). 

Next, by substitution and rearrangement, we see that 

p np-\ 

1=0 
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where 


fi,t —fi(da t ; da\, da 2 , . . . , da t - \ , da t +\, . . . , da n ) 

is an integral polynomial in the numbers (11.14.3), symmetrical in all but 
da t . Hence 

n p np—\ 

E^(“' +x) = (T3T)7 E F '*'’ 

t= l ^ h 1=0 

where 

n n 

Fi = = ^2fi(da t ;da\,..., dct t -\> da t +\, . . . ,da n ). 

t=l t= l 

It follows that Fi is an integral polynomial symmetrical in all the numbers 
(11.14.3), and so an integer. Hence, by Theorem 203, 

n 

Si = ^T<p(a t + h) 

t=i 

is an integer, and 

(11.14.13) 5j = 0 (mod p). 

From (11.14.12) and (11.14.13) it follows that 5o + 5i is an integer not 
divisible by p, and so that 

(11.14.14) |5 0 +5i| ^ 1. 


On the other hand, 

\d\np+p-l \ x \p-l 

llK*)l < — ^ _ 1} , {(W + l«i I) • • - (|x| 4- \a n \)}p — ► 0, 

for any fixed x, when p -*■ oo. It follows that 


(11.14.15) |S 2 | < I 

for sufficiently large p. The three formulae (11.14.7), (11.14.14), and 

(11.14.15) are in contradiction, and therefore n is transcendental. 
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In particular tt is not a ‘Euclidean’ number in the sense of § 11.5; and 
therefore it is impossible to construct, by Euclidean methods, a length equal 
to the circumference of a circle of unit diameter. 

It may be proved by the methods of this section that 


a\e P* + (* 2 e^ 2 + 1 - a s e^ s ^ 0 


if the a and are algebraic, the a are not all zero, and no two fi are equal. 

It has been proved more recently that is transcendental if a and fi are 
algebraic, a is not 0 or 1 , and fi is irrational. This shows, in particular, that 
e~ n , which is one of the values of i 2 ', is transcendental. It also shows that 

A _ lp g 3 
log 2 

is transcendental, since 2° = 3 and 6 is irrational.^ 


NOTES 

§ 11.3. Dirichlet’s argument depends upon the principle ‘if there are /i+l objects in n 
boxes, there must be at least one box which contains two (or more) of the objects’ (the 
Schubfachprinzip of German writers). That in § 1 1. 12 is essentially the same. 

§§ ll.b-7. A full account of Cantor’s work: in the theory of aggregates ( Mengenlehre ) 
will be found in Hobson’s Theory of functions of a real variable , i. 

Liouville’s work was published in the Journal de Math . (1) 16 (1851), 133-42, over 
twenty years before Cantor’s. See also the note on §§ 11.1 3-14. 

Theorem 191 has been improved successively by Thue, Siegel, Dyson, and Gelfond. 
Finally Roth ( Mathematika , 2 (1955), 1-20) showed that no irrational algebraic number is 
approximate to any order greater than 2. Roth’s result can be re-phrased by saying that if 
one takes xi4) = * n Theorem 198, with any fixed € > 0, then the resulting null set 
contains no irrational algebraic numbers. It is not known whether this remains true with any 
essentially smaller function x (<?)• For an account of Schmidt’s generalization of this to the 
simultaneous approximation to several algebraic numbers, see Baker, ch. 7, Th. 7.1. etseq . 
See also Bombieri and Gubler, Heights in Diophantine geometry (Cambridge University 
Press, Cambridge, 2006) for an account of the more general Subspace Theorem and its 
p-adic extensions. For stricter limitations on the degree of rational approximation possible 
to specific irrationals, e.g. 1/2 see Baker, Quart. J. Math . Oxford (2) 15 (1964), 375-83. 
Curently (2007) it is known that 



1 

4^2.4325 


for all positive integers p,q (see Voutier J. Theor. Nombres Bordeaux 19 (2007), 265-90). 

f See §4.7. 
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§§ 11.8-9. Theorems 193 and 194 are due to Hurwitz, Math. Ann. 39 (1891), 279-84; 
and Theorem 195 to Borel, Journal de Math. (5), 9 (1903), 329-75. Our proofs follow 
Perron ( Kettenbriiche , 49-52, and Irrationalzahlen , 129-31). 

§ 11.10. The theorem with 2^/2 is also due to Hurwitz, loc. cit. supra. For fuller 
information see Koksma, 29 et seq. 

Theorems 196 and 197 were proved by Borel, Rendiconti del circolo mat. di Palermo, 
27 (1909), 247-71, and F. Bernstein, Math. Ann. 71 (1912), 417-39. 

For further refinements see Khintchine, Compositio Math. 1 ( 1 934), 36 1-83, and Dyson, 
Journal London Math. Soc. 18 (1943), 40—43. 

§ 1 1 . 1 1 . For Theorem 1 99 see Khintchine, Math. Ann. 92 ( 1 924), 1 1 5-25. 

§ 11.12. We lost nothing by supposing p/q irreducible throughout §§ 11.1-11. 

Suppose, for example, that p/q is a reducible solution of (1 1.1.1). Then if (p,q) = d with 
d > 1 , and we write p = dp\ q = dq\ we have ( p' 9 q ') - 1 and 



so that p'/q' is an irreducible solution of ( 1 1 . 1 . 1 ). 

This sort of reduction is no longer possible when we require a number of rational fractions 
with the same denominator, and some of our conclusions here would become false if we 
insisted on irreducibility. For example, in order that the system (11.12.1) should have an 
infinity of solutions, it would be necessary, after §11.1(1), that every £/ should be irrational. 

We owe this remark to Dr. Wylie. 

§§ 1 1 .13-14. The transcendence of e was proved first by Hermite, Comptes rendus , 77 
(1873), 18-24, etc. ( CEuvres , iii. 150-81); and that of n by F. Lindemann, Math . Ann. 20 
(1882), 213-25. The proofs were afterwards modified and simplified by Hilbert, Hurwitz, 
and other writers. The form in which we give them is in essentials the same as that in 
Landau, Vorlesungen, iii. 90-95, or Perron, Irrationalzahlen , 174-82. 

Nesterenko (Sb. Math. 187 (1996), 1319^1348) showed that n and e* are alge- 
braically independent in the sense that there is no non-zero polynomial P(x,y) with rational 
coefficients such that P(ji , e 7T ) = 0. This result includes the transcendence of both numbers. 

The problem of proving the transcendentality of a^, under the conditions stated at the 
end of § 11.14, was propounded by Hilbert in 1900, and solved independently by Gelfond 
and Schneider, by different methods, in 1 934. Fuller details, and references to the proofs of 
the transcendentality of the other numbers mentioned at the end of § 1 1.7, will be found in 
Koksma, ch. iv. and in Baker, ch. 2. Baker’s book gives an up-to-date account of the whole 
subject of transcendental numbers, in which there have been important recent advances by 
him and others. 

It is unknown whether log 2 and log 3 are algebraically independent, or indeed if there 
exist any two non-zero algebraic numbers a, p such that log a and log p are algebraically 
independent. 
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THE FUNDAMENTAL THEOREM OF ARITHMETIC 
IN Ar( 1 ), k(i), AND k(p) 

12.1. Algebraic numbers and integers. In this chapter we consider 
some simple generalizations of the notion of an integer. 

We defined an algebraic number in § 11.5; £ is an algebraic number if it 
is a root of an equation 

CO*" 4- cir -1 4 1- C n = 0 (co 7 ^ 0) 

whose coefficients are rational integers. * If 

co = 1* 

then £ is said to be an algebraic integer. This is the natural definition, since 
a rational £ = a/b satisfies bt- — a = 0, and is an integer when b = 1 . 
Thus 

/ = V(-1) 

and 

( 12 . 1 . 1 ) p = e^ ni = >(— i + *V3) 

are algebraic integers, since 


i 2 + 1 = 0 


and 


p 2 + p+ 1 = 0 . 

When n = 2, £ is said to be a quadratic number, or integer, as the case 
may be. 

These definitions enable us to restate Theorem 45 in the form 
Theorem 206. An algebraic integer, if rational, is a rational integer. 

t We defined the ‘rational integers’ in § 1.1. Since then we have described them simply as the 
‘integers’, but now it becomes important to distinguish them explicitly from integers of other kinds. 
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12.2. The rational integers, the Gaussian integers, and the integers 
of k(p). For the present we shall be concerned only with the three simplest 
classes of algebraic integers. 

(1) The rational integers (defined in § 1.1) are the algebraic integers for 
which n = 1. For reasons which will appear later, we shall call the rational 
integers the integers ofk{ 1).^ 

(2) The complex or ‘Gaussian’ integers are the numbers 

£ = a + bi, 

where a and b are rational integers. Since 

£ 2 - 2 at; + a 2 + b 2 = 0, 

a Gaussian integer is a quadratic integer. We call the Gaussian integers the 
integers ofk(i). In particular, any rational integer is a Gaussian integer. 
Since 

(a + bi) + (c + di) = (a + c) + (b + d)i, 

(a + bi) (c + di) = ac — bd + {ad + bc)i, 

sums and products of Gaussian integers are Gaussian integers. More 
generally, if a, fi , . . . , k are Gaussian integers, and 

£ =P(a,p,...,K), 

where P is a polynomial whose coefficients are rational or Gaussian 
integers, then £ is a Gaussian integer. 

(3) If p is defined by (12.1.1), then 

p 2 = e$ ni = i(-l + iV3), 

P + P 2 = - 1 , pp 2 = 1 . 

If 

£ = a + bp, 

t We shall define k(6) generally in § 14.1. k(\) is in fact the class of rationals; we shall not use a 
special symbol for the sub-class of rational integers. k(i) is the class of numbers r+si, where r and s 
are rational; and k(p) is defined similarly. 
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(S - a- bp)(% - a- bp 2 ) = 0 


or 


£ 2 - (2 a - b)t; + a 2 - ab + b 2 = 0, 

so that £ is a quadratic integer. We call the numbers £ the integers offc(p). 
Since 

p 2 + p + \ = 0, a + bp = a — b — bp 2 , a + bp 2 = a — b — bp, 

we might equally have defined the integers of k(p) as the numbers a+bp 2 . 

The properties of the integers of k(i) and k(p) resemble in many ways 
those of the rational integers. Our object in this chapter is to study the 
simplest properties common to the three classes of numbers, and in par- 
ticular the property of ‘unique factorization’. This study is important for 
two reasons, first because it is interesting to see how far the properties of 
ordinary integers are susceptible to generalization, and secondly because 
many properties of the rational integers themselves follow most simply and 
most naturally from those of wider classes. 

We shall use small Latin letters a,b,..., as we have usually done, to 
denote rational integers, except that i will always be y/(—\). Integers of 
Ar(/) or k(p) will be denoted by Greek letters a, 

12.3. Euclid’s algorithm. We have already proved the ‘fundamental 
theorem of arithmetic’, for the rational integers, by two different methods, 
in §§ 2.10 and 2.11. We shall now give a third proof which is important 
both logically and historically and will serve us as a model when extending 
it to other classes of numbers.* 

Suppose that 


b > 0. 


Dividing a by b we obtain 


a = qib + ru 

T The fundamental idea of the proof is the same as that of the proof of § 2. 10: the numbers divisible 
by d = {a, b) form a ‘modulus’. But here we determine d by a direct construction. 
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where 0 ^ r\ < b. If r\ 0, we can repeat the process, and obtain 

b = q2r\ + ri, 

where 0 ^ n < n . If ^ 0, 

n = qz n + n, 

where 0 ^ n < rr, and so on. The non-negative integers b,r\, r 2 , . . . , 
form a decreasing sequence, and so 

r n + 1 = 0, 

for some n. The last two steps of the process will be 

r n—2 = q n r n -\ +r n (0 < r n < r„_ i), 

r n— 1 = qn+\ r n- 

This system of equations for r \ , r 2 , . . . is known as Euclid's algorithm. It 
is the same, except for notation, as that of § 10.6. 

Euclid’s algorithm embodies the ordinary process for finding the highest 
common divisor of a and b, as is shown by the next theorem. 

Theorem 207: r n = (a,b). 

Let d = (a, b). Then, using the successive steps of the algorithm, we 
have 


d\a . d\b — ► d\r\ — ► d\r 2 d\r„, 

so that d < r n . Again, working backwards, 


r n \r n - 1 -► f"n\ r n-2 r„\r „-3 -► . . . r n \b r n \a. 

Hence r„ divides both a and b. Since d is the greatest of the common 
divisors of a and b, it follows that r„ < d, and therefore that r n = d. 

12.4. Application of Euclid’s algorithm to the fundamental theorem 
in k(l). We base the proof of the fundamental theorem on two preliminary 
theorems. The first is merely a repetition of Theorem 26, but it is convenient 
to restate it and deduce it from the algorithm. The second is substantially 
equivalent to Theorem 3. 

Theorem 208. Iff\a,f\b, thenf\(a,b). 



12.4 (209)] 
For 


ARITHMETIC IN *(l),*(i), AND k(p) 


233 


f\a .f\b -*f\r\ -+f\r 2 -+f\r„, 

or f\d. 

Theorem 209. If {a, b) = 1 and b \ ac, then b \c. 

If we multiply each line of the algorithm by c, we obtain 

ac = q\bc + r\c , 


r n -ic = q n r n -\c + r n c. 


r n -\c = q n +\r n c, 

which is the algorithm we should have obtained if we started with ac 
and be instead of a and b. Here 


r n = {a, b) = 1 


and so 


(ac, be) — r„c = c. 

Now b\ac, by hypothesis, and b\bc. Hence, by Theorem 208, 

b\(ac,bc ) = c, 

which is what we had to prove. 

Ifpisaprime, theneither/?|aor(a,j?) = 1. In the latter case, by Theorem 
209, p\ac implies p\c. Thus p\ac implies p\a or p\c. This is Theorem 3, and 
from Theorem 3 the fundamental theorem follows as in § 1.3. 

It will be useful to restate the fundamental theorem in a slightly different 
form which extends more naturally to the integers of k(i) and k(p). We call 
the numbers 


€ = ± 1 , 


the divisors of 1, the unities of &(1). The two numbers 
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we call associates. Finally we define a prime as an integer of£(l) which is 
not 0 or a unity and is not divisible by any number except the unities and 
its associates. The primes are then 

±2, ±3, ±5,..., 

and the fundamental theorem takes the form: any integer n ofk( 1), not 0 
or a unity, can be expressed as a product of primes, and the expression is 
unique except in regard to ( a ) the order of the factors, ( b ) the presence of 
unities as factors, and (c) ambiguities between associated primes. 

12.5. Historical remarks on Euclid’s algorithm and the fundamen- 
tal theorem. Euclid’s algorithm is explained at length in Book vii of the 
Elements (Props. 1-3). Euclid deduces from the algorithm, effectively, 
that 


f\a.f\b^f\(a,b) 


and 


(ac, be) = (a, b)c. 

He has thus the weapons which were essential in our proof. 

The actual theorem which he proves (vii. 24) is ‘if two numbers be prime 
to any number, their product also will be prime to the same’; i.e. 

(12.5.1) (a,c) = 1 . ( b,c ) = 1 -► ( ab,c ) = 1. 

Our Theorem 3 follows from this by taking c a prime p, and we can prove 
(12.5. 1) by a slight change in the argument of § 12.4. But Euclid’s method 
of proof, which depends on the notions of ‘parts’ and ‘proportion’, is 
essentially different. 

It might seem strange at first that Euclid, having gone so far, could 
not prove the fundamental theorem itself; but this view would rest on a 
misconception. Euclid had no formal calculus of multiplication and expo- 
nentiation, and it would have been most difficult for him even to state 
the theorem. He had not even a term for the product of more than three 
factors. The omission of the fundamental theorem is in no way casual or 
accidental; Euclid knew very well that the theory of numbers turned upon 
his algorithm, and drew from it all the return he could. 
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12.6. Properties of the Gaussian integers. Throughout this and the 
next two sections the word ‘integer’ means Gaussian integer or integer 
of k(i). 

We define ‘divisible’ and ‘divisor’ in k(i) in the same way as in fc(l); 
an integer £ is said to be divisible by an integer rj, not 0, if there exists an 
integer £ such that 


and rj is then said to be a divisor of £. We express this by r?|£. Since 1,-1, 
i, — / are all integers, any £ has the eight ‘trivial’ divisors 

1 , £, — 1 , — £ , i, i£ , — i, —it; ■ 

Divisibility has the obvious properties expressed by 

<*\ P • P\y -► a|y, 

a|yi °r| Yn -► alPiYi -4 1 ~ PnYn- 

The integer e is said to be a unity of k(i) if e|£ for every £ of k(i). 
Alternatively, we may define a unity as any integer which is a divisor of 1 . 
The two definitions are equivalent, since 1 is a divisor of every integer of 
the field, and 


€|1.1|£ -►€!£. 

The norm of an integer £ is defined by 

W£ = N(a + bi) = c? + b 2 . 

If £ is the conjugate of £, then 

w? = « = l?l 2 . 

Since 

( a 2 + b 2 )(c 2 + d 2 ) = (ac — bd ) 2 + (ad 4- be) 2 , 

Nt- has the properties 

N$Nr) = N(t;ri), Nt-Nrj . . . = N(^rj . . .). 

Theorem 210. The norm of a unity is 1, and any integer whose norm is 
1 is a unity. 
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If € is a unity, then e\ 1. Hence 1 = erj, and so 

l=NeNr l , Ne |1, Ne = 1. 

On the other hand, if N(a + bi) = 1 , we have 

1 = a 2 + b 2 — (a + bi)(a — bi ), a -|- bi 1 1, 

and so a + bi is a unity. 

Theorem 211. The unities of k(i ) are 

e = i s (5 = 0, 1,2,3). 

The only solutions of a 2 + b 2 = 1 are 

a = il, b = 0; a = 0, b = d= 1, 
so that the unities are ±1, ±i. 

If e is any unity, then e£ is said to be associated with £. The associates 
of£ are 


£,/£,-£,-/£; 

and the associates of 1 are the unities. It is clear that if £| rj then %€\ \i)€ 2 , 
where €\, €2 are any unities. Hence, if 77 is divisible by £, any associate of 
ri is divisible by any associate of £. 

12.7. Primes in k(i). A prime is an integer, not 0 or a unity, divisible 
only by numbers associated with itself or with 1. We reserve the letter n 
for primes, t A prime n has no divisors except the eight trivial divisors 

1 , 7 r , — 1 , — 7t , /, in , —i, —in . 

The associates of a prime are clearly also primes. 

Theorem 212. An integer whose norm is a rational prime is a prime. 
For suppose that N% = p, and that £ = . Then 

P = NI; = Nr)NS. 

Hence either Nrj = 1 or7V£ = 1, and either rj or £ is a unity; and therefore 
£ is a prime. Thus N(2 + /') = 5, and 2 + i is a prime. 


t There will be no danger of confusion with the ordinary use of jr. 
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The converse theorem is not true; thus N 3 = 9, but 3 is a prime. 
For suppose that 


3 = (a + bi)(c + di). 


Then 


9 = (a 2 + b 2 )(c 2 + d 2 ). 


It is impossible that 

a 2 + b 2 = c 2 +d 2 = 3 

(since 3 is not the sum of two squares), and therefore either a 2 + b 2 = 1 
or c 2 + d 2 = 1, and either a + bi or c + di is a unity. It follows that 3 is 
a prime. 

A rational integer, prime in k(i), must be a rational prime; but not all 
rational primes are prime in k(i). Thus 

5 = ( 2 + 0 ( 2 - 0 . 


Theorem 213. Any integer, not 0 or a unity, is divisible by a prime. 
If y is an integer, and not a prime, then 

y = a\0\, Nai > 1, N 0\ > 1, Ny = Na\Nf}\, 

and so 

1 < Na\ < Ny. 


If (*i is not a prime, then 


Ofi = (X2P2, bla 2 > 1 , N02 > 1 > 

Not i = Nct2N02 , 1 < N <*2 < Not \. 


We may continue this process so long as ot r is not prime. Since 

Ny, Na\, Nct 2 , . . . 

is a decreasing sequence of positive rational integers, we must sooner or 
later come to a prime a r ; and if a r is the first prime in the sequence y,a i, 
of 2 ,...,then 


Y — P\ a \ = 0102012 = ... = 010203 ■ • ‘0r°t r , 
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and so 


a r \y. 


Theorem 214. Any integer, not 0 or a unity, is a product of primes. 

If y is not 0 or a unity, it is divisible by a prime Jt \ . Hence 

y = 7 t \ Y \, Ny\ < Ny: 

Either y\ is a unity or 

yi=X2Y2, Ny 2 < Ny\. 

Continuing this process we obtain a decreasing sequence 

Ny, Nyi, Nyi,..., 

of positive rational integers. Hence Ny r = 1 for some r, and y r is a unity 
e ; and therefore 


y ~TT\Tt 2 ...n r € =7Tl 

where n' r = n r e is an associate of n r and so itself a prime. 

12.8. The fundamental theorem of arithmetic in k ( i ). Theorem 214 
shows that every y can be expressed in the form 

y = Tt\TT2...Tt r , 

where every it is a prime. The fundamental theorem asserts that, apart from 
trivial variations, this representation is unique. 

Theorem 2 1 5 (The fundamental theorem for Gaussian integers). The 
expression of an integer as a product of primes is unique, apart from 
the order of the primes, the presence of unities, and ambiguities between 
associated primes. 

We use a process, analogous to Euclid’s algorithm, which depends upon 

Theorem 216. Given any two integers y,y\, of which y i ^ 0, there is 
an integer k such that 


y = KY\+Y 2 , Ny 2 <Nyi. 
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We shall actually prove more than this, viz. that 

Nyi ^ \N Y u 

but the essential point, on which the proof of the fundamental theorem 
depends, is what is stated in the theorem. If c and c\ are positive rational 
integers, and c\ / 0, there is a k such that 

c = kc\ +C 2 , 0 ^ C 2 < ci . 

It is on this that the construction of Euclid’s algorithm depends, and 
Theorem 216 provides the basis for a similar construction in k(i). 

Since y \ ^ 0, we have 


— — R Si, 

Y l 

where R and S are real; in fact R and S are rational, but this is irrelevant. 
We can find two rational integers x and y such that 


and then 


Y_ 

Yl 


(x + iy ) 


= |(tf - x) + i(S -y ) | = HR - X ) 2 + (S -y) 2 )i < 



V2' 


If we take 


K=x + iy, Y 2 = y - Ky\, 


we have 


|y-«nl<2 2| n |, 


and so, squaring. 


Ny2 = N(y - Kyi) ^ ^Ny\. 

We now apply Theorem 2 1 6 to obtain an analogue of Euclid’s algorithm. 
If y and y i are given, and yi # 0, we have 


y =icy\+Y2 (Nyz < Ny\). 
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If Yi # 0, we have 


Yi=iciy 2 + Y3 (Nys < N yi). 


and so on. Since 


Ny\, Ny2,... 

is a decreasing sequence of non-negative rational integers, there must be 
an n for which 


Ny n+ \ = 0, y n+ 1 = 0, 

and the last steps of the algorithm will be 

Yn-2 = + Yn Wyn < Ny n -\), 

Yn-l = Kn-lYn- 

It now follows, as in the proof of Theorem 207, that y„ is a common 
divisor of y and y i, and that every common divisor of y and y\ is a 
divisor of y„. 

We have nothing at this stage corresponding exactly to Theorem 207, 
since we have not yet defined ‘highest common divisor’. If £ is a common 
divisor of y and y \ , and every common divisor of y and y \ is a divisor 
of £, we call £ a highest common divisor of y and y \ , and write £ = 
(y , y\ ). Thus y„ is a highest common divisor of y and y \ . The property of 
(y, y l ) corresponding to that proved in Theorem 208 is thus absorbed into 
its definition. 

The highest common divisor is not unique, since any associate of a 
highest common divisor is also a highest common divisor. If t) and £ are 
each highest common divisors, then, by the definition, 

n\S, £1 n. 


and so 


£ = 07 ?, /7 = 0 £ = 00 / 7 , 00 = 1 . 

Hence 0 is a unity and £ an associate of rj, and the highest common divisor 
is unique except for ambiguity between associates. 

It will be noticed that we defined the highest common divisor of two 
numbers of &(1) differently, viz. as the greatest among the common divi- 
sors, and proved as a theorem that it possesses the property which we take 
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as our definition here. We might define the highest common divisors of two 
integers of k(i) as those whose norm is greatest, but the definition which 
we have adopted lends itself more naturally to generalization. 

We now use the algorithm to prove the analogue of Theorem 209, viz. 

Theorem 217. If(y, y 1 ) = 1 and y\\ fiy, then y i| /8. 

We multiply the algorithm throughout by fi and find that 

(PY,PY\) - PYn- 

Since (y,y\) = 1, y„ is a unity, and so 

(PY,PYi) = P- 

Now y 1 1 Py, by hypothesis, and y\\Py\. Hence, by the definition of the 
highest common divisor, 

Y \\(Py,Py\) 

or yi|/3. 

If 7T is prime, and ( 7 r, y) = p,, then /x|tt and p\ y. Since p\jr, either 
(1) p is a unity, and so (tt, y) = 1, or (2) p is an associate of 7 t, and so 
7r\y. Hence, if we take y\ = 7r in Theorem 217, we obtain the analogue 
of Euclid’s Theorem 3, viz. 

Theorem218. Ifn\f}y, then n\(i or tt \y. 

From this the fundamental theorem for k(i) follows by the argument 
used for ^(1) in § 1.3. 

12.9. The integers of k(p). We conclude this chapter with a more 
summary discussion of the integers 

£ = a + bp 

defined in § 12.2. Throughout this section ‘integer’ means ‘integer of k(p)\ 
We define divisor, unity, associate, and prime in k(p) as in k(i); but the 
norm of £ = a + bp is 

N£ = (a 4- bp)(a + bp 2 ) =a 2 —ab + b 2 . 

Since 

a 2 — ab + b 2 = (a — \b ) 2 + \b 2 , 

W£ is positive except when £ = 0. 
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Since 


| a + bp\ 2 = a 2 — ab + b 2 = N(a + bp). 


we have 

NaNp = N(ap), NaNp . . . = N(ap . . .), 


as in k(i). 

Theorems 210, 212, 213, and 214 remain true in k(p); and the proofs 
are the same except for the difference in the form of the norm. 

The unities are given by 

a 2 — ab + b 2 — 1 , 


or 

(2a - b) 2 + 3 b 2 = 4. 

The only solutions of this equation are 

a = ±1, b = 0; a = 0, b = ±1; a = 1, b ~ 1; a = —1, b = — 1 : 
so that the unities are 


± 1 , ±p,±(\+p) 


or 

±1, ±p, ±p ?. 

Any number whose norm is a rational prime is a prime; thus 1 — p is 
a prime, since 7V(1 — p) = 3. The converse is false; for example, 2 is a 
prime. For if 

2 = (a + bp)(c + dp), 

then 

4 = (a 2 - ab + b 2 )(c 2 - cd + d 2 ). 

Hence either a + bp or c + dp is a unity, or 

a 2 -ab + b 2 = ±2, (2a - b) 2 + 3 b 2 = ±8, 


which is impossible. 
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12 . 9 ( 219 - 21 )] 

The fundamental theorem is true in k(p) also, and depends on a theorem 
verbally identical with Theorem 216. 

Theorem 219. Given any two integers y, y 1, of which y 1 ^ 0, there is 
an integer k such that 

Y=ky\+yz, Nyi<Nn- 


For 

y _ a + bp _ (a + ft/o)(c + ^/P 2 ) 
y\ c + dp (c + dp)(c + dp 2 ) 

ac + bd — ad + (be — ad)p 
c*-cd + <P R + Sp • 

say. We can find two rational integers x andy such that 

and then 

2 

- - (x+yp) = (R-x) 2 - (R-x)(S -y) + (S-y) 2 ^ 

Y\ 

Hence, if tc — x +yp, yi = y — icy\, we have 

Nyi = N(y — tcy\) < \Ny\ <Ny\. 

The fundamental theorem for k(p) follows from Theorem 219 by the 
argument used in § 12.8. 

Theorem 220. [The fundamental theorem for k(p ) ] The expression of 
an integer of k(p) as a product of primes is unique, apart from the order 
of the primes, the presence of unities, and ambiguities between associated 
primes. 

We conclude with a few trivial propositions about the integers of k(p) 
which are of no intrinsic interest but will be required in Ch. XIII. 

Theorem 221. X = 1 — p is a prime. 

This has been proved already. 
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Theorem 222. All integers of k(p) fall into three classes (mod X), 
typified by 0, 1 , and — 1 . 

The definitions of a congruence to modulus k, a residue (mod A.), and a 
class of residues (mod X), are the same as in k(\). 

If y is any integer of k(p), we have 

y = a + bp = a + b — bk = a + b (mod X). 

Since3 = (1— p)(l — p 2 ),X|3; and since a+b has one of the three residues 
0, 1 , — I (mod 3), y has one of the same three residues (mod X). These 
residues are incongruent, since neither N\ — 1 nor N2 = 4 is divisible by 
Nk = 3. 

Theorem 223. 3 is associated with X 2 . 

For 


X 2 = 1 — 2p + p 2 = —3 p. 

Theorem 224. The numbers ±(1 — p), ±(1 — p 2 ), ±p(l — p) are all 
associated with X. 

For 

±(1 - p) = ±X, ± (1 - p 2 ) = ^Xp 2 , ±p(l - p) = ±Xp. 


NOTES 

The terminology and notation of this chapter, and also of Chapters 14 and 1 5, has become 
out of date. In particular *(1), k(i), and k(p) are alternatively denoted Q, Q(i), and Q(p). 
Moreover ‘unities’ are alternatively referred to merely as ‘units’. 

§ 12.1. The Gaussian integers were used first by Gauss in his researches on biquadratic 
reciprocity. See in particular his memoirs entitled ‘Theoria residuorum biquadraticorum’, 
Werke , ii. 67-148. Gauss (here and in his memoirs on algebraic equations, Werke , iii. 3-64) 
was the first mathematician to use complex numbers in a really confident and scientific 
way. 

The numbers a + bp were introduced by Eisenstein and Jacobi in their work on cubic 
reciprocity. See Bachmann, Allgemeine Arithmetik der Zahlkorper, 142. 

§ 12.5. We owe the substance of these remarks to Prof. S. Bochner. 

Professor A. A. Mullin drew my attention to Euclid ix. 14, the theorem that, if n is 
the least number divisible by each of the primes pj, then n is not divisible by any 
other prime. This may perhaps be regarded as a fiirther step on Euclid’s part towards the 
Fundamental Theorem. 



XIII 

SOME DIOPHANTINE EQUATIONS 

13.1. Fermat’s last theorem. ‘Fermat’s last theorem’ asserts that the 
equation 

(13.1.1) x n +y n =z n , 

where n is an integer greater than 2, has no integral solutions, except the 
trivial solutions in which one of the variables is 0. The theorem has never 
been proved for all nf or even in an infinity of genuinely distinct cases, 
but it is known to be true for 2 < n < 619. In this chapter we shall be 
concerned only with the two simplest cases of the theorem, in which n = 3 
and n = 4. The case n = 4 is easy, and the case n = 3 provides an excellent 
illustration of the use of the ideas of Ch. XII. 

13.2. The equation x 2 + y 2 = z 2 . The equation (13.1.1) is soluble 
when n = 2; the most familiar solutions are 3, 4, 5 and 5, 12, 13. We 
dispose of this problem first. 

It is plain that we may suppose x,y, z positive, without loss of generality. 
Next 


d\x.d\y-+ d\z. 

Hence, if x,y,z is a solution with (x,y) = d, then* = dx',y = dy\z = dz , 
and x', /, z' is a solution with (x',y') = 1 . We may therefore suppose that 
(x,y) = 1 , the general solution being a multiple of a solution satisfying 
this condition. Finally 

x = 1 (mod 2) . y = 1 (mod 2) — ► z 2 = 2 (mod 4), 

which is impossible; so that one of x andy must be odd and the other even. 

It is therefore sufficient for our purpose to prove the theorem which 
follows. 

Theorem 225. The most general solution of the equation 
(ll2,l) x 2 +y 2 =z 2 , 

satisfying the conditions 

(13.2.2) x > 0, y > 0, z > 0, (x,y) = 1, 2 | x, 

* This has now been resolved. See the end of chapter notes. 
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is 

(13.2.3) x = 2 ab, y = a 2 - b 2 , z = a 2 + b 2 , 
where a, b are integers of opposite parity and 

(13.2.4) (a,b) = 1, a > b > 0. 


There is a (1,1) correspondence between different values of a, b and 
different values of x, y, z. 


First, let us assume (13.2.1) and (13.2.2). Since 2 \x and (;t,y) = 1, 
y and z are odd and (y,z) = 1. Hence \ (z — y) and j(z+y) are integral 
and 


( z-y z _±y\ _ , 
V 2 ’ 2 ) 


By (13.2.1), 


©■-(¥)(¥)■ 


and the two factors on the right, being coprime, must both be squares. 
Hence 


where 



*-y 

2 



a > 0, b > 0, a > b, (a,b) = 1. 


Also 

a + b = a 2 + b 2 =z = 1 (mod 2), 

and a and bare of opposite parity. Hence any solution of( 13.2.1), satisfying 
( 1 3.2.2), is of the form ( 13.2.3); and a and b are of opposite parity and satisfy 

(13.2.4). 

Next, let us assume that a and b are of opposite parity and satisfy ( 1 3.2.4). 
Then 


x 2 +y 2 = 4a 2 b 2 + ( a 2 — b 2 ) 2 = (a 2 + b 2 ) 2 = z 2 , 
x > 0, y > 0, z > 0, 2\x. 
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If ( x,y ) = d, then d | z, and so 

d\y = a 2 - b 2 , d\z = a 2 + b 2 ; 

and therefore d \ 2a 2 , d \ 2b 2 . Since (a, b) = 1, d must be 1 or 2, and the 
second alternative is excluded because is odd. Hence (x,y) = 1. 

Finally, if y and z are given, a 2 and b 2 , and consequently a and b, are 
uniquely determined, so that different values of x,y, and z correspond to 
different values of a and b. 

13.3. The equation x 4 +y 4 — z 4 . We now apply Theorem 225 to the 
proof of Fermat’s theorem for n = 4. This is the only ‘easy’ case of the 
theorem. Actually we prove rather more. 

Theorem 226. There are no positive integral solutions of 

(13.3.1) x 4 +y 4 =z 2 . 

Suppose that u is the least number for which 

(13.3.2) x 4 +y 4 = u 2 (x > 0, y > 0, u > 0) 

has a solution. Then (x,y) = 1, for otherwise we can divide through by 
(x,y) 4 and so replace u by a smaller number. Hence at least one of x andy 
is odd, and 

u 2 = x 4 +y 4 = 1 or 2 (mod 4). 

Since u 2 = 2 (mod 4) is impossible, u is odd, and just one of x and y is 
even. 

If x, say, is even, then, by Theorem 225, 

x 2 = 2 ab, y 2 = a 2 — b 2 , u = a 2 + b 2 , 
a > 0, b > 0, (a, b) = 1 , 

and a and b are of opposite parity. If a is even and b odd, then 

y 2 = —l (mod 4), 

which is impossible; so that a is odd and b even, and say b = 2c. 

Next 
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and so 

a = d 2 , c=f 2 , d> 0, / > 0, (d,f)=l, 

and d is odd. Hence 

y 2 = a 2 — b 2 = d 4 — 4f 4 , 

{If 2 ) 2 + y 2 = (d 2 ) 2 , 

and no two oflf 2 ,y, d 2 have a common factor. 

Applying Theorem 225 again, we obtain 

If 2 = 2/m, d 2 = l 2 + m 2 , / > 0, m > 0, (/, m) = 1. 


Since 

f 2 = lm, (/, m) = 1, 

we have 

/ = r 2 , m — s 2 (r > 0, s > 0), 

and so 

r 4 + s 4 = d 2 . 

But 


d ^ d 2 = a ^ a 2 < a 2 + b 2 = u, 

and so u is not the least number for which (13.3.2) is possible. This 
contradiction proves the theorem. 

The method of proof which we have used, and which was invented and 
applied to many problems by Fermat, is known as the ‘method of descent’. 
If a proposition P{ri) is true for some positive integer n, there is a smallest 
such integer. If P{n), for any positive n, implies P{n') for some smaller 
positive n', then there is no such smallest integer; and the contradiction 
shows that P(n) is false for every n. 

13.4. The equation x 3 +J 3 = z 3 . If Fermat’s theorem is true for some 
n, it is true for any multiple of n, since x ln + y ln = z ln is 
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The theorem is therefore true generally if it is true (a) when n = 4 (as we 
have shown) and ( b ) when n is an odd prime. The only case of ( b ) which 
we can discuss here is the case n = 3. 

The natural method of attack, after Ch. XII, is to write Fermat’s equation 
in the form 


(x + y)(x + py)(x + p 2 y) = z 3 , 

and consider the structure of the various factors in k(p). As in § 13.3, we 
prove rather more than Fermat’s theorem. 

Theorem 227. There are no solutions of 

S 3 + » 3 + c 3 = 0 (( / 0 , 0 # 0 , ( # 0) 

in integers of k{p). In particular, there are no solutions of 

x 3 +y 3 = z 3 

in rational integers, except the trivial solutions in which one of x,y,z is 0. 

In the proof that follows, Greek letters denote integers in k(p), and A. is 
the prime 1 — pJ We may plainly suppose that 

(13.4.1) (r?, O = «, |) = (£, r,) = l. 

We base the proof on four lemmas (Theorems 228-31). 

Theorem 228. If co is not divisible by k, then 

co 2 = ±1 (mod A. 4 ). 

Since co is congruent to one of 0, 1, — 1, by Theorem 222, and k\ co, 
we have 

co = ±1 (mod k). 

We can therefore choose a = ±co so that 

a = 1 (mod X), a = 1 4- f$k. 


t See Theorem 221. 
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Then 


± (a? 1 ) = a 3 - 1 = (a - 1 ) (a - p) (a - p 2 ) 

= Pk(Pk + l- p) (Pk + 1 - p 2 ) 

= k 3 P(P+ l)(p-p 2 ), 

since 1 — p 2 = X(1 + p) = —kp 2 . Also 

p 2 = 1 (modX), 


so that 

P(P + 1 )(P - p 2 ) = P(P + 1 ) (P - 1) (mod A.). 


But one of P, P + 1, P — 1 is divisible by k, by Theorem 222; and so 

±{o? =F 1) = 0 (mod A. 4 ) 


or 


a? = ±1 (mod X 4 ). 

Theorem 229. Ift; 3 + r} 3 + £ 3 = 0, then one of £, rj, £ is divisible by k. 
Let us suppose the contrary. Then 

0 = £ 3 + r) 3 + f 3 a ±1 ± 1 ± 1 (mod X 4 ), 

and so ±1 = 0 or ±3 = 0, i.e. X 4 1 1 or X 4 1 3. The first hypothesis is 
untenable because X is not a unity; and the second because 3 is an associate 
of X 2 t and therefore not divisible by X 4 . Hence one of £, rj, t; must be 
divisible by X. 

We may therefore suppose that X | £, and that 

S = A "y, 

where X f y. Then X { £, X f 77 by (13.4.1), and we have to prove the 
impossibility of 

(13.4.2) H 3 + n 3 + X 3 V = 0, 


* Theorem 223. 
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(13.4.3) (£, 17) = 1, O 1, A. if, M 7), Xfy. 

It is convenient to prove more, viz. that 

(13.4.4) £ 3 + r? 3 + eX 3 V = 0 

cannot be satisfied by any £, r), £, subject to (13.4.3) and any unity e. 
Theorem 230. If%, tj, and y satisfy (13.4.3) and (13.4.4), then n ^ 2. 
By Theorem 228, 

— eX 3rt y 3 = £ 3 + r/ 3 = ±1 ± 1 (mod X 4 ). 


If the signs are the same, then 

— eX 3 "y 3 = ±2 (mod X 4 ), 

which is impossible because X { 2. Hence the signs are opposite, and 

— eX 3n y 3 = 0 (modX 4 ). 


Since X \ y, n > 2. 

Theorem 231. If( 13.4.4) is possible for n = m > 1, then it is possible 
forn = m — 1. 

Theorem 231 represents the critical stage in the proof of Theorem 227; 
when it is proved. Theorem 227 follows immediately. For if (13.4.4) is 
possible for any n, it is possible forn = 1 , in contradiction to Theorem 230. 
The argument is another example of the ‘method of descent’. 

Our hypothesis is that 

(13.4.5) — eX 3 V = (S + m + pm + p 2 v). 

The differences of the factors on the right are 

77X, pT)k, p 2 r}\, 

all associates of 77X. Each of them is divisible by X but not by X 2 (since 
X f r?). 

Since m ^ 2, 3m > 3, and one of the three factors must be divisible by 
X 2 . The other two factors must be divisible by X (since the differences are 
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divisible), but not by A 2 (since the differences are not). We may suppose 
that the factor divisible by A . 2 is £ + rj; if it were one of the other factors, 
we could replace v\ by one of its associates. We have then 

(13.4.6) £ +rj = k 3m ~ 2 Ku % + pr] = kK2, £ + p 2 7? = A* 3 , 

where none of * 1 , K2, *3 is divisible by A.. 

If 8 | K2 and S | * 3 , then 8 also divides 

*2 - *3 = pr\ 


and 


PK3 - p 2 k 2 = pH, 

and therefore both £ and rj. Hence 8 is a unity and (* 2 , * 3 ) = 1 . 

Similarly (*3, * 1 ) = 1 and (* 1 , * 2 ) = 1. 

Substituting from (13.4.6) into (13.4.5), we obtain 

- €y 3 = * 1 * 2 * 3 - 

Hence each of * 1 , * 2 , *3 is an associate of a cube, so that 

H + rj = A. 3m-2 *i = € X k 2m - 2 6\ H + prj = €2W 2 , H + P 2 r) = e. 3 A.^ 3 , 

where 0, <j>, xjs have no common factor and are not divisible by A., and €\, 
€2, €3 are unities. It follows that 

0 = (1+ p + p 2 )(£ + T}) = H + TJ + P(H + PV) + P 2 (H + P 2 r\) 

= €\k 2m ~ 2 6 3 + €2/oA.0 3 + €3p 2 k\jr 3 ; 

and so that 

(13.4.7) <\> 3 + € 4 Vr 3 + esk 3m ~ 3 d 3 = 0, 

where €4 = f 3 P /^2 and €5 = €\/€ 2P are also unities. 

Now m^2 and so 

0 3 + €4\J/ 3 = 0 (mod A 2 ) 

(in fact, mod A 3 ). But A \ <p and A \ x/r, and therefore, by Theorem 228, 

0 3 = ±1 (mod A 2 ), \Jr 3 = ± 1 (mod A 2 ) 



13.4(232)] SOME DIOPHANTINE EQUATIONS 253 

(in fact, mod A. 4 ). Hence 

±1 ± €4 = 0 (mod X^). 

Here 64 is ±1, ±p, or ±p 2 . But none of 

±1 ± p, ±1 ip 2 

is divisible by X 2 , since each is an associate of 1 or of X; and therefore 

€4 = ±1. 

If 64 = 1, (13.4.7) is an equation of the type required. If €4 = — 1, 
we replace if by —xj/. In either case we have proved Theorem 231 and 
therefore Theorem 227. 

13.5. The equation x 3 + y* = 3 z 3 . Almost the same reasoning will 
prove 

Theorem 232. The equation 

x 3 + y 3 = 3z 3 

has no solutions in integers, except the trivial solutions in which z = 0. 

The proof is, as might be expected, substantially the same as that of 
Theorem 227, since 3 is an associate of X 2 . We again prove more, viz. that 
there are no solutions of 

(13.5.1) $ 3 + rj 3 + eA 3w+ V = 0, 

where 


(£,»?)= 1, X\y, 

in integers of k(p). And again we prove the theorem by proving two 
propositions, viz. 

(a) if there is a solution, then n > 0 ; 

(b) if there is a solution for n = m ^ 1 , then there is a solution for 
n = m — 1; 

which are contradictory if there is a solution for any n. 

We have 


(S + m + pm + P 2 v) = — eX 3 m+ 2 y 3 . 
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Hence at least one factor on the left, and therefore every factor, is divisible 
by X; and hence m > 0. It then follows that 3 m 4- 2 > 3 and that one factor 
is divisible by X 2 , and (as in § 13.4) only one. We have therefore 

£ T) = }? m iC\, £ “I - PTJ — XAC2> £ "t“ P 7 = Xtf3> 

the k being coprime in pairs and not divisible by X. 

Hence, as in § 13.4, 

- € y 3 = K\K2K3, 

and k \ , * 2 , *3 are the associates of cubes, so that 

£ + rj = eX 3m 0 3 , % + pT) = €\ X0 3 , £ + p 2 x] = €\W 3 . 

It then follows that 

0 = £ + 77 + p(£ + pT 1 ) + p 2 (£ + p 2 i?) 

= €\X 3m 0 3 + €2 pX0 3 -I- €3P 2 X^ 3 , 

4> 3 +€ 4 fr 3 +€ 5 k 3m -'e 3 = 0; 

and the remainder of the proof is the same as that of Theorem 227. 

It is not possible to prove in this way that 

(13.5.2) £ 3 + *7 3 + eX 3n+1 y 3 ^ 0. 

In fact 


l 3 +2 3 + 9(-l) 3 = 0, 

and, since 9 = pX 4 ,t this equation is of the form (13.5.2). The reader will 
find it instructive to attempt the proof and observe where it fails. 

13.6. The expression of a rational as a sum of rational cubes. 

Theorem 232 has a very interesting application to the ‘additive’ theory 
of numbers. 

The typical problem of this theory is as follows. Suppose that x denotes 
an arbitrary member of a specified class of numbers, such as the class of 
positive integers or the class of rationals, and y is a member of some sub- 
class of the former class, such as the class of integral squares or rational 
cubes. Is it possible to express x in the form 


x =y\ +T2 H — +yk; 

t Sec the proof of Theorem 223. 
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and, if so, how economically, that is to say with how small a value of kl 
For example, suppose x a positive integer and y an integral square. 
Lagrange’s Theorem 369* shows that every positive integer is the sum of 
four squares, so that we may take k = 4. Since 7, for example, is not a sum 
of three squares, the value 4 of k is the least possible or the ‘correct’ one. 

Here we shall suppose that x is a positive rational, and y a non-negative 
rational cube, and we shall show that the ‘correct’ value of k is 3. 

In the first place we have, as a corollary of Theorem 232, 

Theorem 233. There are positive rationals which are not sums of two 
non-negative rational cubes. 

For example, 3 is such a rational. For 



involves 


(ad) 3 + (be) 3 = 3 (bd) 3 , 
in contradiction to Theorem 232.* 

In order to show that 3 is an admissible value of k, we require another 
theorem of a more elementary character. 

Theorem 234. Any positive rational is the sum of three positive rational 
cubes. 

We have to solve 

(13.6.1) r = x 3 + y 3 + z 3 , 

where r is given, with positive rational x,y,z. It is easily verified that 
x 3 +y 3 +z 3 = (x+y + z) 3 - 3(y + z)(z + x)(x + y) 
and so (13.6.1) is equivalent to 

(x+y + z) 3 - 3(y + z)(z + x)(x + y) = r. 


t Proved in various ways in Ch. XX. 

$ Theorem 227 shows that 1 is not the sum of two positive rational cubes, but it is of course 
expressible as 0 3 + l 3 . 
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If we write X = y + z, Y=z+x, Z=x+y, this becomes 

(13.6.2) (X + Y + Z) 3 - 24XYZ = Sr. 

If we put 

X + Z Y 

(13.6.3) u = ~ > v=-, 

(13.6.2) becomes 

(13.6.4) ( u + v) 3 - 24v(« - 1) = 8 rZ -3 . 

Next we restrict Z and v to satisfy 

(13.6.5) r = 3Z 3 v, 


so that (13.6.4) reduces to 

(13.6.6) ( u 4- v) 3 = 24«v. 


To solve (13.6.6), we put u = vt and find that 


(13.6.7) 


24/ 2 _ 24 1 

~ (t+ l) 3 ’ V_ (/+l) 3 ' 


This is a solution of (13.6.6) for every rational t. We have still to satisfy 
(13.6.5), which now becomes 

r(t + l) 3 = 72Z 3 r. 


If we put t = r/(72w 3 ), where w is any rational number, we have 
Z = w(t + 1). Hence a solution of (13.6.2) is 


(13.6.8) X = {u- 1 )Z, Y = vZ, Z = w(t + 1), 

where u, v are given by (13.6.7) with t = nv~^/72. We deduce the solution 
of ( 1 3 .6. 1 ) by using 


(13.6.9) 2 x=Y + Z-X, 2 y = Z+X-Y, 2 z=X + Y-Z. 
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13.6] 


To complete the proof of Theorem 234, we have to show that we can 
choose w so that x,y,z are all positive. If w is taken positive, then t and Z 
are positive. Now, by (13.6.8) and (13.6.9) we have 


2x , _ 

— =v + 1 — (u - 1)=2 + v — u, 

Jl* 



2 z 

— =« + v 


- 2 . 


These are all positive provided that 


M > v u — v < 2 < « + v. 


that is 


t> 1, 12r(r - 1) < (r + l) 3 < \2t(t + 1). 

These are certainly true if r is a little greater than 1 , and we may choose w 
so that 


r 

72w 3 


satisfies this requirement. (In fact, it is enough if 1 < t ^ 2.) 

Suppose for example that r = |. If we put w = g so that / = 2, we have 


The equation 


2 

3 


1 




which is equivalent to 


(13.6.10) 


6 3 = 3 3 + 4 3 + 5 3 , 


is even simpler, but is not obtainable by this method. 

13.7. The equation x 3 +y* + z 3 = t 3 . There are a number of other 
Diophantine equations which it would be natural to consider here; and the 
most interesting are 


x 3 + y 3 + z 3 = t 3 


(13.7.1) 
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and 

(13.7.2) x 3 +y 3 = u 3 +v 3 . 

The second equation is derived from the first by writing — u, v forz, t. 

Each of the equations gives rise to a number of different problems, since 
we may look for solutions in (a) integers or ( b ) rationals, and we may or 
may not be interested in the signs of the solutions. The simplest problem 
(and the only one which has been solved completely) is that of the solution 
of the equations in positive or negative rationals. For this problem, the 
equations are equivalent, and we take the form (13.7.2). The complete 
solution was found by Euler and simplified by Binet. 

If we put 


x=X -Y, y = X+Y, u=U-V , v=U+V, 

(13.7.2) becomes 

(13.7.3) X(X 2 + 3 Y 2 ) = U{U 2 + 3 V 2 ). 

We suppose that X and Y are not both 0. We may then write 

I/+KV(— 3) V - tV(-3) 

X + YJ(- 3) a + V( 3), x _yj { _ 3) -° V< 3). 

where a, b are rational. From the first of these 


(13.7.4) U = aX — 3 bY, V = bX + aY , 

while (13.7.3) becomes 


X = U(a 2 + 3b 2 ). 

This last, combined with the first of (13.7.4), gives us 

cX = dY, 


where 


c = a(a 2 + 3b 2 ) — 1, d = 3b(a 2 + 3b 2 ). 

If c = d = 0, then b = 0, a = 1, X = U, Y = V. Otherwise 
(13.7.5) X = kd = 3 kb(a 2 + 3b 2 ), Y = kc = X { a(a 2 + 3b 2 ) - 1 } , 
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where k ^ 0. Using these in (13.7.4), we find that 
(13.7.6) U = 3 kb, V = k{(a 2 + 3 b 2 ) 2 - a ] . 

Hence, apart from the two trivial solutions 

X = Y = U = 0; X =U, Y = V, 

every rational solution of (13.7.3) takes the form given in (13.7.5) and 

(13.7.6) for appropriate rational k, a, ft. 

Conversely, if X, a, b are any rational numbers andX, Y, U, V are defined 
by (13.7.5) and (13.7.6), the formulae (13.7.4) follow at once and 

U{U 2 + 3 V 2 ) = 3 kb{(aX - 3 bY) 2 + 3 (bX + aY ) 2 } 

= 3 kb(a 2 + 3 b 2 )(X 2 + 3F 2 ) = X(X 2 + 3 Y 2 ). 

We have thus proved 

Theorem 235. Apart from the trivial solutions 

(13.7.7) x=y = 0, u = — v; x = u, y = v, 
the general rational solution o/( 13.7.2) is given by 

(13.7.8) 

( x = k{l-(a- 3 b)(a 2 + 3 b 2 )} , y = k {(a + 3 b)(a 2 + 3b 2 ) - 1} , 

1 u = k {(a + 3b) - (a 2 + 3ft 2 ) 2 } , v = k [(a 2 + 3ft 2 ) 2 - (a - 3ft)} , 

where k, a , ft are any rational numbers except that k ^ 0. 

The problem of finding all integral solutions of (13.7.2) is more difficult. 
Integral values of a, ft, and k in (13.7.8) give an integral solution, but there 
is no converse correspondence. The simplest solution of ( 1 3 .7.2) in positive 
integers is 

(13.7.9) x = 1, y = 12, u = 9, v = 10, 
corresponding to 

a ~ T§' ° ~ _ T9’ k ~ “IT- 

On the other hand, if we put a = ft=l,A. = j, we have 

x = 3, y = 5, u — —4, v = 6, 
equivalent to (13.6.10). 
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Other simple solutions of (13.7.1) or (13.7.2) are 
l 3 +6 3 + 8 3 = 9 3 , 2 3 + 34 3 = 15 3 4- 33 3 , 9 3 + 15 3 = 2 3 + 16 3 . 

Ramanujan gave 

x = 3a 2 + Sab — 5b 2 , y — 4 a 2 — 4 ab + 6b 2 , 
z = 5a 2 — Sab — 3b 2 , t = 6 a 2 — 4 ab 4- 4b 2 , 

as a solution of (13.7.1). If we take a = 2, b = 1, we obtain the solution 
(17, 14, 7, 20). If we take a = 1, b = —2, we obtain a solution equivalent 
to (13.7.9). Other similar solutions are recorded in Dickson’s History. 
Much less is known about the equation 

(13.7.10) x 4 +/ = u 4 + v 4 , 

first solved by Euler. The simplest parametric solution known is 

x = a 7 + a 5 b 2 - 2 a 3 b 4 + 3a 2 b 5 + ab 6 , 
y = a 6 b — 3 a 5 b 2 — 2 a 4 b 3 + a 2 b 5 + b 1 , 
u = a 1 + a 5 b 2 — 2 a 3 b 4 + 3 a 2 b 5 + ab 6 , 
v = a 6 b + 3 a 5 b 2 - 2 a 4 b 3 + a 2 b 5 + b 1 , 

but this solution is not in any sense complete. When a = 1, b = 2 it leads to 

133 4 + 134 4 = 158 4 + 59 4 , 

and this is the smallest integral solution of (13.7.10). 

To solve (13.7. 10), we put 

(13.7.12) r = w + c, y — bw — d, u = aw + d, v — bw + c. 

We thus obtain a quartic equation for a>, in which the first and last 
coefficients are zero. The coefficient of &> 3 will also be zero if 

c(a 3 -b 3 ) = d(a 3 +b 3 ), 

in particular if c = a 3 + b 3 , d = a 3 — b 3 ; and then, on dividing by a>, we 
find that 


(13.7.11) 


3(o (a 2 - b 2 )(c 2 - d 2 ) = 2 {ad 3 - ac 3 + be 3 + bd 3 ). 
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Finally, when we substitute these values of c 9 d , and co in (13.7.12), and 
multiply throughout by 3 a 2 b 2 , we obtain (13.7:11). 

We shall say something more about problems of this kind in Ch. XXI. 


NOTES 

§ 13.1. All this chapter, up to § 13.5, is modelled on Landau, Vorlesungen, iii. 201-17. 
See also Mordell, Diophantine equations , and the first pages of Cassels, J. London Math. 
Soc. 41 (1966), 193-291. 

The phrase ‘Diophantine equation’ is derived from Diophantus of Alexandria (about 
a.d. 250), who was the first writer to make a systematic study of the solution of equations 
in integers. Diophantus proved the substance of Theorem 225. Particular solutions had 
been known to Greek mathematicians from Pythagoras onwards. Heath’s Diophantus of 
Alexandria (Cambridge, 1910) includes translations of all the extant works of Diophantus, 
of Fermat’s comments on them, and of many solutions of Diophantine problems by Euler. 

There is a very large literature about ‘Fermat’s last theorem’. In particular we may 
refer to Bachmann, Das Fermatproblem (1919; reprinted Berlin, Springer, 1976); Dickson, 
History , ii, ch. xxvi; Landau, Vorlesungen , iii; Mordell, Three lectures on Fermat's last 
theorem (Cambridge, 1921); Vandiver, Report of the committee on algebraic numbers , ii 
(Washington, 1928), ch. ii, and Amer. Math. Monthly, 53 (1946), 556-78. An excellent 
account of the current state of knowledge about the theorem with fijll references is given by 
Ribenboim ( Canadian Math. Bull. 20 (1977), 229-42). For a more detailed account of the 
subject and related theory, see Edwards, Fermat's Last Theorem (Berlin, Springer, 1977). 

The theorem was enunciated by Fermat in 1 637 in a marginal note in his copy of Bachet’s 
edition of the works of Diophantus. Here he asserts definitely that he possessed a proof, 
but the later histoiy of the subject seems to show that he must have been mistaken. A very 
large number of fallacious proofs have been published. 

In view of the remark at the beginning of § 13.4, we can suppose that n = p > 2. 
Kummer (1850) proved the theorem for n = p, whenever the odd prime p is ‘regular’, i.e. 
when p does not divide the numerator of any of the numbers 

y 

where B *, is the Arth Bernoulli number defined at the beginning of § 7.9. It is known, 
however, that there is an infinity of ‘irregular’ p. Various criteria have been developed 
(notably by Vandiver) for the truth of the theorem when p is irregular. The corresponding 
calculations have been carried out on a computer and, as a result, the theorem is now known 
to be true for all p < 125000. If, however, (13.1.1) is satisfied for any larger prime, then 
min (x,y) has more than 3 billion digits. See Ribenboim loc. cit. for references and Stewart, 
Mathematika 24 (1977), 130-2 for another result. 

The problem is much simplified if it is assumed that no one of x,y,z is divisible by p. 
Wieferich proved in 1 909 that there are no such solutions unless 2 P ~ 1 = 1 (mod p 2 ), which 
is true for p — 1093 (§ 6.10) but for no other p less than 2000. Later writers have found 
further conditions of die same kind and by this means it has been shown that there are no 
solutions of this kind forp < 3 x 10 9 or for p any Mersenne prime (and so for the latest 
known prime). See Ribenboim loc . cit. 

Fermat’s Last Theorem was finally settled in a pair of papers by Wiles, and by Wiles 
and Taylor, (Ann. of Math. (2) 141 (1995), 443-551 and 553-72). Unlike its predecessors 
described above, this work uses a connection between Fermat’s equation and elliptic curves. 
Investigations by Hellegouarch, Frey, and Ribet had previously established that Fermat’s 
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Last Theorem would follow from a standard conjecture on elliptic curves, namely the 
Taniyama-Shimura conjecture. Wiles was able to establish an important special case of 
the latter conjecture, which was sufficient to handle Fermat’s Last Theorem. The paper by 
Wiles and Taylor provided the proof of a key step needed for Wiles’ work. 

§ 13.3. Theorem 226 was actually proved by Fermat. See Dickson, History , ii, ch. xxii. 
§ 13.4. Theorem 227 was proved by Euler between 1753 and 1770. The proof was 
incomplete at one point, but the gap was filled by Legendre. See Dickson, History , ii, 
ch. xxi. 

Our proof follows that given by Landau, but Landau presents it as a first exercise in the 
use of ideals, which we have to avoid. 

§ 1 3.6. Theorem 234 is due to Richmond, Proc. London Math. Soc. (2) 2 1 ( 1 923), 401-9. 
His proof is based on formulae given much earlier by Ryley [The ladies * diary (1825), 35], 
Ryley’s formulae have been reconsidered and generalized by Richmond [Proc. 
Edinburgh Math . Soc. (2) 2 (1930), 92-100, and Journal London Math. Soc. 17 (1942), 
196-9] and Mordell [Journal London Math. Soc. 17 (1942), 194-6]. Richmond finds 
solutions not included in Ryley’s; for example, 

3(1 — / + t 2 )x = 5(1 + / 3 ), 3(1 - t + t 2 )y = s(3t - 1 - / 3 ), 

3(1 - t + t 2 )z = 5(3/ - 3 1 2 ), 

where s is rational and / = 3 r/s 3 . Mordell solves the more general equation 

(X+Y + Z) 3 - dXYZ = m, 

of which (13.6.2) is a particular case. Our presentation of the proof is based on Mordell’s. 
There are a number of other papers on cubic Diophantine equations in three variables, by 
Mordell and B. Segre, in later numbers of the Journal. Indeed Segre ( Math Notae, 1 1 
(1951), 1-68), has shown that if any non-degenerate cubic equation in three variables has 
a rational solution, it will have infinitely many solutions. This suffices to handle (13.6.1), 
which has a rational point ‘at infinity’. A full account of much recent work on homogeneous 
equations of degree 3 and 4 variables is given by Manin ( Cubic forms, Amsterdam, North 
Holland, 1974). 

§ 1 3.7. The first results concerning ‘equal sums of two cubes’ were found by Vieta before 
1591 . See Dickson, History, ii. 550 etseq. Theorem 235 is due to Euler. Our method follows 
that of Hurwitz, Math. Werke, 2 (1933), 469-70. 

The parameterization (13.7.8) has maximal degree 4 in a and b. There is an alternative 
parameterization of degree 3, namely 

x = A .(A+B + C-D), y = MA+B-C + D), 
u = X(A — B + C + D), v = X(A — B — C — D), 


where 


A = 9a 3 + 3ab 2 + 3b, B = 6ab, C = 9c?b + 3b 3 + b, D = 3a 2 + 3b 2 + 1, 

see Hua, Introduction to number theory, (Springer, New York, 1982), 290-91. 

Euler’s solution of (13.7.10) is given in Dickson, Introduction, 60-62. His formulae, 
which are not quite so simple as ( 1 3.7. 1 1), may be derived from the latter by writing/ -I- g 
and / — g for a and b and dividing by 2. TTie formulae ( 1 3.7. 1 1 ) themselves were first given 
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by Gerardin, L ’ Intermediate des mathematiciens, 24 (1917), 51. The simple solution here 
is due to Swinnerton-Dyer, Journal London Math. Soc. 1 8 (1943), 2-4. 

Leech ( Proc . Cambridge Phil. Soc. 53 (1957), 778—80) lists numerical solutions of 
(13.7.2), of (13.7.10), and of several other Diophantine equations. 

In 1844 Catalan conjectured that the only solution in integers p , q> jc, y, each greater 
than 1, of the equation 


jc*-y« = 1 

is/? = y = 2, q = x = 3. This has been proved by Mihailescu (J. ReineAngew. Math. 572 
(2004), 167-195). 

One of the most powerful results on Diophantine equations is due to Faltings {Invent. 
Math. 73 (1983), 349-66). A special case of this relates to equations of the form 
/(x,y,z) = 0, where / is a homogeneous polynomial of degree at least 4, with integral 
coefficients. One says that / is nonsingular if the partial derivatives of / cannot vanish 
simultaneously for any complex (x,y,z) apart from (0, 0, 0). For such an /, Fairing’s theo- 
rem asserts that the equation / (x,y,z) = 0 has at most finitely many distinct sloutions, up 
to multiplication by a constant. One may take / (jt,y,z) = ax n + by n — cz n for n ^ 4, and 
deduce that the generalized Fermat equation has at most finitely many essentially distinct 
solutions for each n. 

Many of the equations considered in this chapter take the form a + b = c, where a, b and 
c are constant multiples of powers. A very general conjecture about such equations, now 
known as the ‘ abc conjecture’ has been made by Oesterle and by Masser in 1985. It states 
that if € > 0 there is a constant K{e) with the following property. If a 7 b , c are any positive 
integers such that a + 6 = c, then c ^ K{s)r{abc)^ e , where the function r{m) is defined 
as the product of the distinct prime factors of m. 

As an example of the potential applications of this conjecture, consider the Fermat 
equation (13.1.1). Taking a = x n , b = yP and c — z", we observe that 

r(abc) = r(x n y n z n ) < xyz ^ z 3 

whence the conjecture would yield z 71 ^ AX£)z 3 ( 1+£ ). Choosing s = 1/2, and assuming 
that n^4we would then have 

z n AT(l/2)z 7 / 2 ^ K(l/2)z ln/ *. 

From this we can deduce that z n < K{ 1 /2) 8 . Thus the abc conjecture immediately implies 
that Fermat’s equation has at most finitely many solutions in x, y, z, n, for n ^ 4. In fact 
a whole host of other important results and conjectures are now known to follow from the 
abc conjecture. 



XIV 

QUADRATIC FIELDS (1) 


14.1. Algebraic fields. In Ch. XII we considered the integers of k(i) 
and k(p), but did not develop the theory farther than was necessary for the 
purposes of Ch. XIII. In this and the next chapter we carry our investigation 
of the integers of quadratic fields a little farther. 

An algebraic field is the aggregate of all numbers 


R(#) = 


W) 

Qm’ 


where ft is a given algebraic number, P(&) and Q(fi) are polynomials in 
& with rational coefficients, and Q(&) 0. We denote this field by k(&). 

It is plain that sums and products of numbers of k(&) belong to k(&) and 
that a/fi belongs to k(&) if a and fi belong to k(&) and fi ^ 0. 

In § 1 1 .5, we defined an algebraic number £ as any root of an algebraic 
equation 

(14.1.1) aox n + aix n ~ l H \- a„ = 0, 


where ao, a \, . . . are rational integers, not all zero. If £ satisfies an alge- 
braic equation of degree n, but none of lower degree, we say that £ is of 
degree n. 

If n = l, then £ is rational and £(£) is the aggregate of rationals. Hence, 
for every rational £, £(£) denotes the same aggregate, the field of rationals, 
which we denote by £(1). This field is part of every algebraic field. 

If n = 2, we say that £ is ‘quadratic’. Then £ is a root of a quadratic 
equation 


oqx 2 + a\x + 02 = 0 , 


and so 


c b 

for some rational integers a, b, c, m. Without loss of generality, we may 
take m to have no squared factor. It is then easily verified that the field 
£(£) is the same aggregate as k(^/m). Hence it will be enough for us to 
consider the quadratic fields k(*Jm) for every ‘quadratfrei’ rational integer 
m, positive or negative (apart from m = 1). 
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Any member £ of k(y/m ) has the form 


PWm) 

QWm) 


t + Uyjm 
v + Wy/m 


(t + Uy/m){v — Wy/m) _ a + by/m 
v 2 — w 2 m c 


for rational integers t, u, v, w, a, b, c. We have (c£ — a) 2 = mb 2 , and so £ 
is a root of 


(14.1.2) c 2 x 2 — 2 acx + a 2 ~ mb 2 — 0. 


Hence £ is either rational or quadratic; i.e. every member of a quadratic 
field is either a rational or a quadratic number. 

The field k(y/m) includes a sub-class formed by all the algebraic integers 
of the field. In § 12.1 we defined an algebraic integer as any root of an 
equation 

(14.1.3) x J + c\x J ~ x H 1- Cj = 0, 

where c\ , . . . , Cj are rational integers. We appear then to have a choice in 
defining the integers of k(y/m). We may say that a number £ of k(y/m) is 
an integer of k(y/m) (i) if £ satisfies an equation of the form (14.1.3) for 
some j, or (ii) if £ satisfies an equation of the form (14.1.3) with j = 2. In 
the next section, however, we show that the set of integers of k(y/m) is the 
same whichever definition we use. 

14.2. Algebraic numbers and integers; primitive polynomials. We 
say that the integral polynomial 

(14.2.1) f(x) = aox n + a\x n ~ l -\ (- a n 

is a primitive polynomial if 

ao > 0, (ao,a\,...,a n ) = 1 


in the notation of p. 20. Under the same conditions, we call (14.1.1) a 
primitive equation. The equation (14.1.3) is obviously primitive. 

Theorem 236. An algebraic number £ of degree n satisfies a unique 
primitive equation of degree n. If$ is an algebraic integer, the coefficient 
of x n in this primitive equation is unity. 

For n = 1, the first part is trivial; the second part is equivalent to 
Theorem 206. Hence Theorem 236 is a generalization of Theorem 206. We 
shall deduce Theorem 236 from 
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Theorem 237. Let £ be an algebraic number of degree n and let fix) = 0 
be a primitive equation of degree n satisfied by £. Let g(x) = 0 be any 
primitive equation satisfied by £. Then g(x) = f (x)A(x) for some primitive 
polynomial h(x) and all x. 

By the definition of £ and n there must be at least one polynomial / (x) of 
degree n such that /(£) = 0. We may clearly suppose / (x) primitive. Again 
the degree of g(x) cannot be less than n. Hence we can divide g(x) by 
/ (x) by means of the division algorithm of elementary algebra and obtain 
a quotient H(x) and a remainder AT (x), such that 

(14.2.2) g(x) =f(x)H(x) + K(x), 

H(x) and K(x) are polynomials with rational coefficients, and K(x) is of 
degree less than n. 

If we put* = £ in (14.2.2), we have AT(£) = 0. But this is impossible, 
since £ is of degree n, unless K(x) has all its coefficients zero. Hence 

gto =f(x)H(x). 

If we multiply this throughout by an appropriate rational integer, we obtain 

(14.2.3) cg(x) =f(x)h(x), 

where c is a positive integer and h(x) is an integral polynomial. Let d be the 
highest common divisor of the coefficients of h(x). Since g is primitive, 
we must have d\c. Hence, if d > 1, we may remove the factor d; that is, 
we may take h(x) primitive in (14.2.3). Now suppose that p\c, where p is 
prime. It follows that f{x)h{x) = 0 (mod p) and so, by Theorem 104 (i), 
either f(x) = 0 or h(x) = 0 (mod p). Both are impossible for primitive/ 
and h and so c = 1. This is Theorem 237. 

The proof of Theorem 236 is now simple. If g(x) = 0 is a primitive 
equation of degree n satisfied by £, then h(x) is a primitive polynomial of 
degree 0; i.e. h(x) = 1 and g(x) = f(x) for all x. Hence / (x) is unique. 

If £ is an algebraic integer, then £ satisfies an equation of the form 

(14.1.3) for some j ^ n. We write g(x) for the left-hand side of (14.1.3) 
and, by Theorem 237, we have 

g(x) =f(x)h(x), 

where h{x) is of degree j - n. Iff (x) = aox n H and h(x) = ho x J ~ n + 

• • • , we have 1 = aoho, and so ao = 1. This completes the proof of 
Theorem 236. 
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14.3 (238)] 


14.3. The general quadratic field k(fm). We now define the integers 
of k{y/m) as those algebraic integers which belong to k{y/m). We use 
‘integer’ throughout this chapter and Ch. XV for an integer of the particular 
field in which we are working. 

With the notation of § 14.1, let 


H = 


a + by/m 

c 


be an integer, where we may suppose that c > 0 and (a, b,c) = 1 . If b = 0, 
then £ = a/c is rational, c — 1, and £ = a, any rational integer. 

If b ^ 0,£ is quadratic. Hence, if we divide (14.1.2) through by c 2 , we 
obtain a primitive equation whose leading coefficient is 1. Thus c|2a and 
c 2 \(a 2 — mb 2 ). If d = ( a,c ), we have 

d 2 \a 2 , d 2 \c 2 , d 2 1 (a 2 - mb 2 ) d 2 \mb 2 d\b , 


since m has no squared factor. But (a, b, c) = 1 and so d = 1. Since c|2a, 
we have c = 1 or 2. 

If c = 2, then a is odd and mb 2 = a 2 = l (mod 4), so that b is odd and 
m = 1 (mod 4). We must therefore distinguish two cases. 

(i) If m ^l(mod 4), then c = 1 and the integers of k{y/m) are 

£ = a + by/m 


with rational integral a, b. In this case m = 2 or m = 3 (mod 4). 

(ii) If m = l(mod 4), one integer of k(y/m) is r = \{y/m — 1) and all 
the integers can be expressed simply in terms of this r. If c = 2, we have 
a and b odd and 


H = 


a + by/m 
2 


a + b 
2 


+ br = a\ + (2b\ + l)r, 


where a\, b\ are rational integers. If c = 1, 


£ = a + by/m — a + b + 2bz = a\ + 2b\ x. 


where a\,b\ are rational integers. Hence, if we change our notation a little, 
the integers of k(y/m) are the numbers a + bz with rational integral a, b. 

Theorem 238. The integers ofk(y/m) are the numbers 


a + by/m 
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when m = 2 or m = 3 (mod 4), and the numbers 

a + bx = a + jb(*fm — 1 ) 

when m = 1 (mod 4), a and b being in either case rational integers. 

The field k(i) is an example of the first case and the field k[*/(— 3)} of 
the second. In the latter case 

r = — 5 + jiy/3 = p 

and the field is the same as k(p). If the integers of k($) can be 
expressed as 


a + b<f>, 

where a and b run through the rational integers, then we say that [1, <pi\ is 
a basis of the integers of k{$). Thus [1, /] is a basis of the integers of k(i), 
and [1, p ] of those of k{^/(— 3)}. 

14.4. Unities and primes. The definitions of divisibility, divisor, unity, 
and prime in k(^m) are the same as in k(i); thus a is divisible by fi, or 
a, if there is an integer y of k(fm) such that a = fiy^ A unity € is a 
divisor of 1, and of every integer of the field. In particular 1 and —1 are 
unities. The numbers e£ are the associates of £, and a prime is a number 
divisible only by the unities and its associates. 

Theorem 239 . If e \ and €2 are unities, then €\€2 and 61/62 are unities. 

There are a Si and a <$2 such that 6 i<5j = 1, 62 S 2 = 1, and 

6162S1S2 = 1 -► 61621 1 - 

Hence 6162 is a unity. Also 82 = 1 /€2 is a unity; and so, combining these 
results, 61/62 is a unity. 

We call | = r — sfm the conjugate of £ = r + s^Jm. When m < 0, f 
is also the conjugate of £ in the sense of analysis, £ and £ being conjugate 
complex numbers; but when m > 0 the meaning is different. 

t If or and fi are rational integers, then y is rational, and so a rational integer, so that fi Icr then means 
the same in as in£(l). 
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The norm Ni ; of £ is defined by 

N% = ££ = (r + Sy/m)(r - sjm) = r 2 — ms 2 . 

If £ is an integer, then 7V£ is a rational integer. If m = 2 or 3 (mod 4), and 
£ = a + b*Jm, then 

7V£ = a 2 — mb 2 ; 
and if m = 1 (mod 4), and £ = a + boj, then 

Ni= = (a- \b) 2 - \mb 2 . 

Norms are positive in complex fields, but not necessarily in real fields. In 
any case = N^Ntj. 

Theorem 240. The norm of a unity is dh 1 , and every number whose norm 
is dh 1 is a unity. 

For (a) 


e\ 1 — ► eS = 1 — >• NeNS = 1 Ne = ±1, 


and ( b ) 


££=JV£ = ±1 -»£|1. 


If m < 0, m = — ft, then the equations 

a 2 + fib 2 =1 (m = 2, 3 (mod 4)), 

(a — \b) 2 + \fxb 2 =1 (m — 1 (mod 4)), 

have only a finite number of solutions. This number is 4 in k(i), 6 in k(p), 
and 2 otherwise, since 


a = ±1 ,b = 0 

are the only solutions when /x > 3. 

There are an infinity of unities in a real field, as we shall see in a moment 
in fc(V 2). 

7V£ may be negative in a real field, but 

m £ = m\ 
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is a positive integer, except when £ = 0. Hence, repeating the arguments 
of § 12.7, with in the place of N% when the field is real, we obtain 

Theorem 241 . An integer whose norm is a rational prime is prime. 

Theorem 242. An integer, not 0 or a unity, can be expressed as a product 
of primes. 

The question of the uniqueness of the expression remains open. 

14.5. The unities of *C/2). When m = 2, 

Ntj = a 2 — 2b 2 
and 

a 2 -2b 2 = -1 

has the solutions 1, 1 and —1,1. Hence 

(o = 1 4- y/2, a > -1 = —d) = — 1 4- y/2 

are unities. It follows, after Theorem 239, that all the numbers 

(14.5.1) ±(o n ,±a>~ n (n = 0,1,2,...) 

are unities. There are unities, of either sign, as large or as small as we 
please. 

Theorem 243. The numbers (14.5.1) are the only unities of 

(i) We prove first that there is no unity € between 1 and co. If there were, 
we should have 

1 < x + y*J 2 = € < 1 4- *J2 
and 

x 2 -2y 2 = ±1; 

so that 

— 1 < x —y-J2 < 1, 

0 < 2x < 2 -I - y/2. 
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Hence x = landl < l+y^/2 < l+y/2, which is impossible for integral 
(ii) If € > 0, then either € = of or 


co n < e < co n+l 


for some integral n. In the latter case a>~ n € is a unity, by Theorem 239, and 
lies between 1 and a>. This contradicts (i); and therefore every positive € is 
an (o n . Since — e is a unity if e is a unity, this proves the theorem. 

Since Nco = — 1 , Nco 2 = 1, we have proved incidentally 

Theorem 244. All rational integral solutions of 


are given by 


and all of 


x 2 — 2y 2 ~ 1 


x+jV2 = ±(l+V2) 2n , 


x 2 -2 y 2 = -1 


x+yV '2 = ±(1+J2) 2n+1 , 


with n a rational integer. 
The equation 


x 2 — my 2 = 1, 


where m is positive and not a square, has always an infinity of solutions, 
which may be found from the continued fraction for y/m. In this case 

the length of the period is 1, and the solution is particularly simple. If the 
convergents are 


Pn_l 3 7 

q n r2’5 ! 


(n = 0,1,2,...) 
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then/?,!, q„, and 


<t>n — Pn + <7n\/2) 0>i — Pn ~ qn-J^- 


are solutions of 


x„ = 2x„-i -\-x„- 2 . 


From 


00 =®, 01= W 2 , 00 = “ W , 01=G> 


-2 


and 


w" = 2a)” -1 + a)" -2 , (-a;) - ” = 2(-a)) - ” +1 + (-a)) - ” +2 , 

it follows that 


4>n = CO n+ \ 0„ = (-O)) - ” -1 


for all n. Hence 


Pn = i K + ' + ) = J |d + V2)” + ‘ + (1 - V2)” + ‘l . 

9n = [72 | w " +1 - (-®)-”- 1 ) = yi ((1 + V2)" +l - (1 - V2)" +1 ) , 


and 

P 2 n - = 0„0„ = (-1)" +I . 

The convergents of odd rank give solutions of x 2 — 2y 2 = 1 and those of 
even rank solutions of x 2 — ly 2 = — 1 . 

If jc 2 — 2y 2 = 1 andjc/y > 0, then 

* 1 1 1 
< y ~ y(x+yy/2) < y.2yj2 < 2y 2 ' 

Hence, by Theorem 184, x/y is a convergent. The convergents also give 
all the solutions of the other equation, but this is not quite so easy to prove. 
In general, only some of the convergents to yjm yield unities of k(y/m). 
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14.6. Fields in which the fundamental theorem is false. The funda- 
mental theorem of arithmetic is true in &(1), k(i), k(p), and (though we 
have not yet proved so) in k(J 2). It is important to show by examples, 
before proceeding farther, that it is not true in every k(^m). The simplest 
examples are m = — 5 and (among real fields) m = 10. 

(i) Since —5 = 3 (mod 4), the integers of k{*/(— 5)} are a -I- b*/(— 5). 
It is easy to verify that the four numbers 

2, 3, 1 + V(-5), 1 - V(-5) 


are prime. Thus 


implies 


1 + V(-5) = {a + V(-5)Hc + <V(-5)1 

6 = (a 2 + 5ft 2 ) (c 2 + 5 d 2 ); 


and a 2 + 5ft 2 must be 2 or 3, if neither factor is a unity. Since neither 2 
nor 3 is of this form, 1 -I- N /(— 5) is prime; and the other numbers may be 
proved prime similarly. But 

6 = 2.3 = {i +v /(-5)}{l-V(-5)}, 

and 6 has two distinct decompositions into primes. 

(ii) Since 10=2 (mod 4), the integers of ^(^/lO) are a -f ft^lO. In this 
case 

6 = 2 . 3 = (4 + V10)(4 - V10), 


and it is again easy to prove that all four factors are prime. Thus, for 
example, 

2 = (a + by/\0)(c + dy/\0) 

implies 


4 = (a 2 - 10ft 2 )(c 2 - 10 d 2 ). 


and a 2 — 10ft 2 must be ±2, if neither factor is a unity. This is impossible 
because neither of ±2 is a quadratic residue of 10.* 


t l 2 , 2 2 , 3 2 , 4 2 , 5 2 , 6 2 , 7 2 , 8 2 , 9 * = 1, 4, 9, 6, 5, 6, 9, 4, 1 (mod 10). 
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The falsity of the fundamental theorem in these fields involves the falsity 
of other theorems which are central in the arithmetic of &(1). Thus, if a 
and /3 are integers of &(1), without a common factor, there are integers k 
and fi for which 


ock ■(■ — 1* 

This theorem is false in k{ <J(,— 5)}. Suppose, for example, that a and fi are 
the primes 3 and 1 + 5). Then 

3 {a + V(“5)} + {1 + V(-5)Hc + = 1 

involves 

3a + c — 5d=\, 36 + c + rf = 0 


and so 


3a — 3b — 6d = \, 

which is impossible. 

14.7. Complex Euclidean fields. A simple field is a field in which 
the fundamental theorem is true. The arithmetic of simple fields follows 
the lines of rational arithmetic, while in other cases a new foundation is 
required. The problem of determining all simple fields is very difficult, and 
no complete solution has been found, though Heilbronn has proved that, 
when m is negative, the number of simple fields is finite. 

We proved the fundamental theorem in k(i) and k(p) by establishing an 
analogue of Euclid’s algorithm in k(l). Let us suppose, generally, that the 
proposition 

(E) ‘ given integers y and y \ , with y\ ^ 0, then there is an integer k 
such that 


y = *Y\ + X2» IWysl < l^yil’ 

is true in k(*Jm). This is what we proved, for k(i) and k{p), in Theorems 
216 and 219; but we have replaced Ny by |iVy| in order to include real 
fields. In these circumstances we say that there is a Euclidean algorithm 
in k(*Jm), or that the field is Euclidean. 

We can then repeat the arguments of §§ 12.8 and 12.9 (with the 
substitution of | Ny \ for Ny), and we conclude that 
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Theorem 245. The fundamental theorem is true in any Euclidean 
quadratic field. 

The conclusion is not confined to quadratic fields, but it is only in such 
fields that we have defined Ny and are in a position to state it precisely. 

( E ) is plainly equivalent to 

(£') * given any S ( integral or not) of k(*Jm), there is an integer k such 
that 

(14.7.1) |W(a-*)i < r. 

Suppose now that 


S = r + sjm, 

where r and s are rational. If m ^k\ (mod 4) then 

k =r4 yjm, 

where x andy are rational integers, and (14.7.1) is 

(14.7.2) |(r — x) 2 — m (s — y) 2 \ < 1. 

lfm=\ (mod 4) then 


•c = x +y + \y (y/m - l) = x 4- \y + \yjm, * 
where x and y are rational integers, and (14.7.1) is 


(14.7.3) 



When m = —/x < 0, it is easy to determine all fields in which these 
inequalities can be satisfied for any r, s and appropriate x,y. 

Theorem 246. There are just five complex Euclidean quadratic fields, 

viz. the fields in which 

\ 

m = -1,-2, -3, -7, -11. 


t The form of § 14.3 with x + y 9 y for a, b. 
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There are two cases. 

(i) When m ^ 1 (mod 4), we take r = s — j in (14.7.2); and we 
require 

\ + < i, 

or fx < 3. Hence p, = 1 and p = 2 are the only possible cases; and in these 
cases we can plainly satisfy (14.7.2), for any r and s, by taking x and y to 
be the integers nearest to r and s. 

(ii) When m == 1 (mod 4) we take r = s = ^ in (14.7.3). We require 

H < 1 - 

Since fx = 3(mod 4), the only possible values of (x are 3, 7, 11. Given s, 
there is a y for which 

\2s-y\ ^ 

and an x for which 


and then 


(r 


— x — 



< 1 . 


Hence (14.7.3) can be satisfied when \x has one of the three values in 
question. 

There are other simple fields, suchask(V(— 19)} andk{^/(— 43)}, which 
do not possess an algorithm; the condition is sufficient but not necessary 
for simplicity. There are just nine simple complex quadratic fields, viz. 
those corresponding to 


m = -1,-2, -3, -7, -11, -19, -43, -67, -163. 

14.8. Real Euclidean fields. The real fields with an algorithm are more 
numerous. 

Theorem 247* k(*Jm) is Euclidean when 

m = 2, 3, 5, 6, 7, 11, 13, 17, 19,21,29,33,37,41,57,73 
and for no other positive m. 

We can plainly satisfy (14.7.2) when m = 2 or m = 3, since we can 
choose x and y so that \r — x\ ^ \ and |s — y\ < j. Hence Jc{+J2) and 



14.8(248)] QUADRATIC FIELDS 277 

k(*j3) are Euclidean, and therefore simple. We cannot prove Theorem 247 
here, but we shall prove 

Theorem 248. k(*Jm) is Euclidean when 

m = 2,3,5,6,7,13,17,21,29. 

If we write 

X = 0, n — m (m 1 (mod 4)), 

X = 5 , n = \m (m = 1 (mod 4)) , 

and replace 2s by s when m = 1, then we can combine (14.7.2) and (14.7.3) 
in the form 

(14.8.1) \{r-x-ky) 2 -n{s-y) 2 \<\. 

Let us assume that there is no algorithm in k(y/m). Then (14.8. 1) is false 
for some rational r,s and all integral x,y; and we may suppose thaff 

(14.8.2) 00^ i. 

t This is very easy to see when m ^ 1 (mod 4) and the left-hand side of (14.8.1) is 

|(r-Ac) 2 -m(i-j») 2 |; 

for this is unaltered if we write 

€\r + u , €1* + M, € 2 S + V, €iy + v » 

where €\ and €2 are each 1 or — 1, and u and v are integers, for 

r,x t s t y\ 

and we can always choose €1 , € 2 , u, v so that e 1 r + u and € 2 S + v lie between 0 and £ inclusive. 

The situation is a little more complex when m = l(mod 4) and the left-hand side of (14.8.1) is 

\( r -x- iy) 2 - \m(s-y) 2 . 

This is unaltered by the substitution of any of 

(1) €\r + w, €\X + u, €\S, €\y t 

(2) r, x- v, 5 + 2v,y + 2v, 

(3) r, x+y, - 5 , -y, 

(4) \ -r, -x, 1 - 5 , 1 -y, 

for /; x, 5, y. We first use ( 1) to make 0 ^ r < j ; then (2) to make — 1 ^ s ^ 1 ; and then, if necessary, 
'(3) to make 0 ^ s ^ 1 . If then 0 ^ s ^ ^ , the reduction is completed. If j ^ s ^ 1, we end by using 

(4), as we can do because \ —r lies between 0 and j if r does so. 
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There is therefore a pair r, s satisfying (14.8.2), such that one or other of 

[P(x,y)] (r-x- Xy) 2 ^ 1 +n(s- y) 2 ' 

[W(x,>0] n(s-y) 2 > l + (r-x-Xy) 2 

is true for every x,y. The particular inequalities which we shall use are 

[P(0, 0)] r 2 > 1 +ns 2 , [N(0, 0)] ns 2 > 1 +r 2 , 

[/>(1,0)] (1 — r) 2 > 1 + ns 2 , [N( 1,0)] /w 2 ^ l + (l-r) 2 , 

[/>(— 1,0)] (1+r) 2 > 1+/W 2 , [A^ (—1,0)] ns 2 > l + (l+r) 2 . 

One at least of each of these pairs of inequalities is true for some r and s 
satisfying (14.8.2). If r = s = 0, P(0, 0) and N( 0,0) are both false, so that 
this possibility is excluded. 

Since r and s satisfy (14.8.2), and are not both 0, P( 0, 0) and P(l, 0) are 
false; and therefore N( 0, 0) and N(l, 0) are true. If /*(— 1, 0) were true, 
then N( 1 , 0) and P(— 1 , 0) would give 

(1+r) 2 ^ 1 +/IJ 2 ^2 + (l -r) 2 

and so 4r ^ 2. From this and (14.8.2) it would follow that r = j and 
ns 2 = |, which is impossible.^ Hence P(— 1, 0) is false, and therefore 
N(— 1, 0) is true. This gives 

ns 2 ^ 1 + (1 + r) 2 ^ 2, 

and this and (14.8.2) give n ^ 8. 

It follows that there is an algorithm in all cases in which n < 8, and these 
are the cases enumerated in Theorem 248. 

+ Suppose that s = p/q, where (p,q) = 1. If m ^1 (mod 4), then m — n and 

Amp 2 = Sq 2 . 

Hence p 2 15, so that p = 1; and q 2 \4m. But m has no squared factor, and 0 < s ^ A. Hence q = 2, 
s = j and m = 5 = 1 (mod 4), a contradiction. 

If m = 1 (mod 4), then m = 4n and 

mp 2 = Sq 2 . 


From this we deduce p = 1, q = 1, s = 1, in contradiction to (14.8.2). 
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There is no algorithm when m = 23. Take r = 0, s = 33. Then 
(14.8.1) is 


|23x 2 - (23^ - 7) 2 1 ^ 23. 

Since 

£ = 23* 2 - (23 y - 7) 2 = -49 = -3 (mod 23), 

£ must be —3 or 20, and it is easy to see that each of these hypotheses is 
impossible. Suppose, for example, that 

£ = 23X 2 - Y 2 = -3. 

Then neither A' nor Y can be divisible by 3, and 

X 2 =l, Y 2 = 1, £ = 22 = 1 (mod 3), 

a contradiction. 

The field k(y/ 23), though not Euclidean, is simple; but we cannot prove 
this here. 

14.9. Real Euclidean fields. ( continued ). It is naturally more difficult 
to prove that k(*/m) is not Euclidean for all positive m except those listed 
in Theorem 247, than to prove k(y/m) Euclidean for particular values of 
m. In this direction we prove only 

Theorem 249. The number of real Euclidean fields k(yjm), where m = 
2 or 3 (mod 4), is finite. 

Let us suppose k(^/m) Euclidean and m ^ 1 ( mod 4). We take r = 0 and 
s = t/m in (14.7.2), where / is an integer to be chosen later. Then there are 
rational integers x,y such that 

<1, \(my — t) 2 — mx 2 ] < m. 

Since 

(my - t) 2 - 



mx 2 = r 2 (mod m). 
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there are rational integers x,z such that 

(14.9.1) z 2 - mx 2 = * 2 (mod m), \ z 2 - mx 2 \ < m. 

If m = 3 (mod 4), we choose t an odd integer such that 

5m < t < 6m, 

as we certainly can do if m is large enough. By (14-9. 1), z 2 — mx 2 is equal 
to t 2 — 5m or to t 2 — 6m, so that one of 

(14.9.2) t 2 -z 2 = m( 5 - jc 2 ), t 2 - z 2 = m( 6 - x 2 ) 
is true. But, to modulus 8, 

t 2 = 1, z 2 ,x 2 = 0, 1, or 4, m = 3 or 7; 

t 2 — z 2 == 0, 1, or 5, 

5 — x 2 = 1,4, or 5; 6 — x 2 = 2, 5, or 6; 

m(5 - x 2 ) = 3, 4, or 7; m( 6 - x 2 ) = 2, 3, 6, or 7; 

and, however we choose the residues, each of (14.9.2) is impossible. 

If m = 2 (mod 4), we choose t odd and such that 2m < t 2 < 3m, as we 
can if m is large enough. In this case, one of 

(14.9.3) t 2 - z 2 = m(2 - x 2 ), t 2 - z 2 = m{3 - x) 2 
is true. But, to modulus 8, m = 2 or 6: 

2 — jc 2 = 1, 2, or 6; 3 — x 2 = 2, 3, or 7; 

m{2 — x 2 ) = 2, 4, or 6; m( 3 — x 2 ) = 2, 4, or 6; 

and each of (14.9.3) is impossible. 

Hence, if m = 2 or 3 (mod 4) and if m is large enough, k(^/m) cannot 
be Euclidean. This is Theorem 249. The same is, of course, true for m = 1 , 
but the proof is distinctly more difficult. 

NOTES 


The terminology and notation of this chapter has become out of date since it was originally 
written. In particular it has become customary to write Q (V™) rather than k (s/m) , and to 
refer to ‘units’ rather than ‘unities’. Moreover, one usually says that the ring of integers of a 
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field is a 'unique factorization domain’, rather than calling the field ‘simple’. The property 
( E ) in §14.7 is generally referred to by saying that the field is ‘Norm-Euclidean’. We say 
that the field (or its ring of integers) is ‘Euclidean’ if there is any function <j> whatsoever, 
defined on the non-zero integers of the field and taking positive integer values, with the 
following two properties. 

(i) If y 1 and yi are non-zero integers with y i \yi, then <j>(y\ ) ^ 0(K2)« 

(ii) If y 1 and Y2 are non-zero integers with y\ t K2> then there is an integer k such that 
<f>(Y\ -*Y2 ) < 0(K2)- 

We shall follow this terminology for the two notions of Euclidean field for the remainder 
of the notes on this chapter. 

§§ 14.1-6. The theory of quadratic fields is developed in detail in Bachmann’s 
Grundlehren der neueren Zahlentheorie (Goschens Lehrbiicherei, no. 3, ed. 2, 1931) and 
Sommer’s Vorlesungen iiber Zahlentheorie. There is a French translation of Sommer’s 
book, with the title Introduction a la theorie des nombres algebriques (Paris, 1911); and 
a more elementary account of the theory, with many numerical examples, in Reid’s The 
elements of the theory of algebraic numbers (New York, 1910). 

§ 14.5. The equation x 2 —my 2 = 1 is usually called Pell’s equation, but this is the result 
of a misunderstanding. See Dickson, History , ii, ch. xii, especially pp. 341, 351, 354. 
There is a very full account of the history of the equation in Whitford’s The Pell equation 
(New York, 1912). 

§ 14.7. Theorem 245 is true for Euclidean fields in general, and not merely for Norm- 
Euclidean fields. This can be proved by the arguments of §§12.8 and 12.9. Theorem 246 
refers to the Norm-Euclidean property, but in fact there are no further complex quadratic 
Euclidean fields, even with the wider definition given at the start of these notes, see Samuel 
{J. Algebra, 19(1971), 282-301). 

Heilbronn and Linfoot ( Quarterly Journal of Math. (Oxford), 5 (1934), 150-60 and 
293-301) proved that there was at most one simple complex quadratic field other than 
those listed at the end of § 14.7. Stark {Michigan Math. J. 14 (1967), 1-27) proved that 
this extra field did not exist. Baker (ch. 5) showed that the same result followed from his 
approach to transcendence. 

An earlier approach to this problem by Heegner {Math. Zeit. 56 (1952), 227-53), had 
originally been supposed incomplete, but was later found to be essentially correct. 

§ 14.8-9. Theorem 247, which refers to Norm-Euclidean fields, is essentially due to 
Chatland and Davenport [Canadian Journal of Math. 2 ( 1 950), 289-96]. Davenport [Proc. 
London Math. Soc. (2) 53 ( 1 95 1 ), 65-82] showed that k{y/m) cannot be Norm-Euclidean if 
m > 2 14 = 16384, which reduced the proof of Theorem 247 to the study of a finite number 
of values of m. Chatland [Bulletin Amer. Math. Soc. 55 (1949), 948-53] gives a list of 
references to previous results, including a mistaken announcement by another that fc(\/97) 
was Norm-Euclidean. Barnes and Swinnerton-Dyer [Acta Math. 87 (1952) 259-323] show 
that £(>/97) is not, in fact, Norm-Euclidean. 

Our proof of Theorem 249 is due to Oppenheim, Math. Annalen 1 09 ( 1 934), 349-52, and 
that of Theorem 249 to E. Berg, Fysiogr. Sdllsk. Lund Forh. 5 (1935), 1-6. Both theorems 
relate to the Norm-Euclidean property. 

It has been shown by Harper, {Canad. J. Math. 56 (2004), 55-70), that the field 
k{V 14) is Euclidean, and hence the integers satisfy the fundamental theorem, even though 
it is not Norm-Euclidean. It is conjectured that there are infinitely many real quadratic fields 
with the unique factorization property, and that they are all Euclidean, although only those 
listed in Theorem 247 can be Norm-Euclidean. 

When p is a prime there appear to be a large number of fields k{^/p) with the unique 
factorization property. Indeed Cohen and Lenstra {Number theory , Noordwijkerhout 1983 , 
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Springer Lecture Notes in Math. 1068, 33-62), have given heuristics leading to a pre- 
cise conjecture, which would show that k(Jp) has the unique factorization property for 
asymptotically a positive proportion of primes. 

We expect an infinity of real quadratic fields with the unique factorization property. 
However if we restrict attention to square-free integers m for which there is a small non- 
trivial unit, then the picture changes. Thus, for square-free numbers m of the form m = 
4 r 2 + 1, there is a ‘small’ unit 2m + *fr, and it has been shown by Bir6 (Acta Arith. 107 
(2003), 179-94), that in this case one obtains a unique factorization domain if and only if 
r = 1, 2, 3, 5, 7 or 13. 
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15.1. The primes of k(i). We begin this chapter by determining the 
primes of k(i) and a few other simple quadratic fields. 

If 7T is a prime of then 

7t I Nn = 7C7T 

and 7r | |A^tt |. There are therefore positive rational integers divisible by n. 
If z is the least such integer, z = z\Z 2 , and the field is simple, then 

n\z\Z 2 -*■ n\z\ orn\z 2 , 

a contradiction unless z\ or zj is 1 . Hence z is a rational prime. Thus n 
divides at least one rational prime p. If it divides two, say p and p\ then 

jt\p.7t\p' -► n\px —p'y = 1 

for appropriate x and y, a contradiction. 

Theorem 250. Any prime n of a simple field k(^/m) is a divisor of just 
one positive rational prime. 

The primes of a simple field are therefore to be determined by the 
factorization, in the field, of rational primes. 

We consider k(i) first. If 

n=a + bi\p, 7tk=p, 


then 


NttNX = p 2 . 

Either NX = 1 , when X is a unity and n an associate of p, or 
(15.1.1) Nn =a 2 + b 2 = p. 

(i) If p = 2, then 

p= l 2 + l 2 = (1 + 0(1 - /) = /(I - o 2 . 

The numbers 1 + i, — 1+ 1 , — 1 — i, 1 — i (which are associates) are primes 
of k(i). 
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(ii) If p = An + 3, (15.1.1) is impossible, since a square is congruent to 
0 or 1 (mod 4). Hence the primes An + 3 are primes of k(i). 

(iii) If p = An + 1, then 

(?) - '• 

by Theorem 82, and there is an x for which 

p\x 2 + \, p\(x + i)(x - i). 

If p were a prime of k(i), it would divide x + / or x — i, and this is false, 
since the numbers 

- ± - 
P P 

are not integers. Hence p is not a prime. It follows that p = nk, where 
t r = a + bi, k = a — bi, and 

Nn = a 2 + b 2 = p. 

In this case p can be expressed as a sum of two squares. 

The prime divisors of p are 

(15.1.2) 7i, in, —n, —in, k, ik, —k, —ik, 

and any of these numbers may be substituted for n. The eight variations 
correspond to the eight equations 

(15.1.3) {±a) 2 + ( ±b ) 2 = (±b) 2 + (±a) 2 = p. 

And if p = c 2 + d 2 then c 4- id\p, so that c + id is one of the numbers 
(15.1.2). Hence, apart from these variations, the expression of p as a sum 
of squares is unique. 

Theorem 251. A rational prime p = An + 1 can be expressed as a sum 
a 2 + b 2, of two squares. 

Theorem 252. The primes of k(i) are 

(1) 1 + / and its associates, 

(2) the rational primes An + 3 and their associates, 

(3) the factors a A- bi of the rational primes An + 1. 
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15.2. Fermat’s theorem in k(i). As an illustration of the arithmetic of 
k(i), we select the analogue of Fermat’s theorem. We consider only the 
analogue of Theorem 71 and not that of the more general Fermat-Euler 
theorem. It may be worth repeating that y | (a — 0) and 

a = /3(mod y) 

mean, when we are working in the field k($), that a — 0 = icy, where k 
is an integer of the field. 

We denote rational primes An + 1 and An + 3 by p and q respectively, 
and a prime of k(i) by jt . We confine our attention to primes of the classes 
(2) and (3), i.e. primes whose norm is odd; thus n is a q or a divisor of a p. 
We write 


= Ntz - 1, 

so that 

<(>(n)=p-l (n\p), 0(7r) = q 2 - 1 (7r = q). 

Theorem 253. If (a, jz) = 1, then 

a </>(jr) = i( mo d 7 r). 

Suppose that a = / + im. Then, when n\p,i p = i and 

a p = (/ + imf* = l p + (inif = l p + i/w^Cmod p), 
by Theorem 75; and so 

a p = l + im = a (mod p), 

by Theorem 70. The same congruence is true mod 7t, and we may remove 
the factor a. 

When it = q, i q = —i and 

ot q = (/ + im) q = l q — im q = l — im — a (mod q ) . 

Similarly, a q = a, so that 

a* 2 = a, a q2 ~ l = 1 (mod q) . 
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The theorem can also be proved on lines corresponding to those of § 6. 1 . 
Suppose for example that jt = a 4- bi\p. The number 

( a + bi)(c + di) = ac — bd + iiad + be) 

is a multiple of n and, since (a, b) = 1 , we can choose c and d so that 
ad + be = 1 . Hence there is an 5 such that 

n | s + i. 


Now consider the numbers 

r = 0, 1,2, . . . , Nit - 1 = a 2 + b 2 — 1, 

which are plainly incongruent (mod n). If x + yi is any integer of k(i), 
there is an r for which 


x — sy = r (mod Nji)\ 


and then 


x +yi = y(s + i) + r = r (modjr). 

Hence the r form a ‘complete system of residues’ (mod tt). 

If a is prime to n, then, as in rational arithmetic, the numbers ar also 
form a complete system of residues.^ Hence 

n<-) = fh-o w )» 

and the theorem follows as in § 6. 1 . 

The proof in the other case is similar, but the ‘complete system’ is 
constructed differently. 

15J. The primes of k{p). The primes of k(p) are also factors of 
rational primes, and there are again three cases. 

(1) Ifp = 3, then 


p-(l- p)( 1 - p 2 ) = (1 + p)( 1 - p) 2 = -p 2 (l - p) 2 . 


By Theorem 221, 1 — p is a prime. 

t Compare Theorem 58. The proof is essentially the same. 
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(2) lip = 2 (mod 3) then it is impossible that Nn = p, since 

4Njt = (2a - b ) 2 + 3 b 2 

is congruent to 0 or 1 (mod 3). Hence p is a prime in k(p). 

(3) lip = 1 (mod 3) then 

(3 - ■ 

by Theorem 96, and p\x 2 + 3. It then follows as in § 15.1 that p is divisible 
by a prime 7r = a + bp, and that 


p = Ntt = a 2 — ab + b 2 . 

Theorem 254. A rational prime 3 n + 1 is expressible in the form 
a 2 — ab + b 2 . 

Theorem 255. The primes ofk(p) are 

(1) 1 — p and its associates, 

(2) the rational primes 3n + 2 and their associates, 

(3) the factors a + bp of the rational primes 3n + 1. 

15.4. The primes of k(J 2) and k(J 5). The discussion goes similarly 
in other simple fields. In k( s /2), for example, either p is prime or 

(15.4.1) Ntt = a 2 - 2 b 2 = ±p. 

Every square is congruent to 0, 1, or 4 (mod 8), and (15.4.1) is impossible 
when p is 8/i ± 3. When p is 8/i dh 1, 2 is a quadratic residue of p by 
Theorem 95, and we show as before that p is factorizable. Finally 

2 = (J2) 2 , 


and y/2 is prime. 

Theorem 256. The primes ofk(*f 2) are (1) y /2, (2) the rational primes 
8/i ± 3, (3) the factors a + by/ 2 of rational primes 8/i± 1 (and the associates 
of these numbers). 
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We consider one more example because we require the results in § 15.5. 
The integers of k(«j5) are the numbers a + boo, where a and b are rational 
integers and 

(15.4.2) a> = \ (1+ V 5 ) • 

The norm of a + bco is a 2 + ab — b 2 . The numbers 

(15.4.3) ±a> ±n (n = 0,1,2,...) 

are unities, and we can prove as in § 14.5 that there are no more. 

The determination of the primes depends upon the equation 

Njt = a 2 + ab — b 2 = p. 


or 

(2a + b f - 5b 2 = 4p. 

Ifp = 5n± 2, then (2a + b) 2 = ±3 (mod 5), which is impossible. Hence 
these primes are primes in k(y/ 5). 
lip = 5/i ± 1, then 



by Theorem 97. Hence p\(x 2 — 5) for some x, and we conclude as before 
that p is factorizable. Finally 


5 = (y/5) 2 = (2(0 - l) 2 . 

Theorem 257. The unities of ^(^5) are the numbers (15.4.3). The 
primes are (1) f5, (2) the rational primes 5n ± 2, (3) the factors a + bco 
of rational primes 5n ± 1 (and the associates of these numbers). 

We shall also need the analogue of Fermat’s theorem. 

Theorem 258. If p and q are the rational primes 5n ± 1 and 5n ± 2 
respectively; <f>(n) = |iV7r| — 1, so that 

<p(n) = p- l (n\p), 4 >(tt) = q 2 - l (jr = q); 
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and ( a, n ) = 1; then 

(15.4.4) 

(15.4.5) 

(15.4.6) 


a <f>(n) = ] (mod jt) , 
a p ~ l = 1 (mod 7r ) , 
a q+l = Aa (mod #) . 


Further, ifn\p,H is the conjugate of n, (a, tt) = 1 and (a, 7r) = 1 , then 

(15.4.7) a p ~ l = 1 (mod/;). 

First, if 


2 a = c + dy/5. 


then 

2a p = (2af = (c + djsf = c? + d p 5^ P ~ X) J5 (mod p ) . 
But 

53(/>-!) s | ( moc ] p} ? 

cP ~ c and d p = d. Hence 

(15.4.8) 2a p = c + d*J 5 = 2a (mod p) , 

and, a fortiori. 


(15.4.9) 2a? = 2a (mod n) . 

Since (2, n) = 1 and(a,7r) = 1, we may divide by 2a, and obtain (15.4.5). 
If also (a,n) = 1, so that ( a,p ) = 1, then we may divide (15.4.8) by 2a, 
and obtain (15.4.7). 

Similarly, if q > 2, 


(15.4.10) 2a q = c — d*J 5 = 2a, a q = a (mod q) , 

(15.4.11) a q+x =aa=Na (modgr). 

This proves (15.4.6). Also (15.4.10) involves 

a ql =a q = a (mod q ) , 
a q2 ~ x = 1 (mod q ) . 


(15.4.12) 
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Finally (15.4.5) and (15.4.12) together contain (15.4.4). 

The proof fails if q = 2, but (15.4.4) and (15.4.6) are still true. If 
a = e+fa) then one of e and f is odd, and therefore Na = e 2 + ef —f 2 
is odd. Also, to modulus 2, 

a 2 = e 2 + f 2 co 2 = e + fay 2 = e +f(co + 1) = e +/(1 — co) 

= e +fcb = a 


and 

a 3 = aa = Na = 1. 

We note in passing that our results give incidentally another proof of Theorem 180. 
The nth Fibonacci number is 

co n — cb n co n — cb n 

— ~7Z » 

co — co 

where co is the number (15.4.2) and co = — l/<o is its conjugate. 

If n = p, then 

(o p ~ l = 1 (mod p) , cb p ~ l ~ 1 (mod p ) , 
u p -\y/5 = co p ~ l — cb p ~ l = 0 (mod p) , 
and therefore u p —\ = 0 (mod p). If n = q, then 

= Nco, ai ?+1 = Nco (mod q) , 

Uq+i -v/5 = 0 (mod q) 

and = 0 (mod q). 


15.5. Lucas’s test for the primality of the Mersenne number 
We are now in a position to prove a remarkable theorem which is due, in 
substance at any rate, to Lucas, and which contains a necessary and suffi- 
cient condition for the primality of A/ 4 n + 3 . Many ‘necessary and sufficient 
conditions’ contain no more than a transformation of a problem, but this 
one gives a practical test which can be applied to otherwise inaccessible 
examples. 

We define the sequence 

r\,r 2 ,r 3 , . . . = 3,7,47, . . . 


by 


9 m - 

r m = co + to 


- 2 m 
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where to is the number (15.4.2) and to = — l /to. Then 

r m + 1 =£- 2. 

In the notation of § 10.14, 

r m — V 2 «. 

No two r m have a common factor, since (i) they are all odd, and 
(ii) r m = 0 -► r m +\ = -2 -► r v a 2 (v > m + 1), 
to any odd prime modulus. 

Theorem 259. If p is a prime 4 n + 3, and 

M = M p = IP - 1 

is the corresponding Mersenne number, then M is prime if 

(15.5.1) r p - 1 = 0 (mod M) , 
and otherwise composite. 

(1) Suppose M prime. Since 

M = 8.16" — 1=8 — 1=2 (mod 5), 
we may take a = to, q = M in (15.4.6). Hence 

co 2P = (o M+l =Nco = - 1 (mod M ) , 
r p -\ = d) v ~' (co 2P + l) = 0 (mod M ) , 

which is (15.5.1). 

(2) Suppose (15.5. 1) true. Then 

co 2P + 1 = (o 2f> ' r p -\ = 0 (mod M ) , 

(15.5.2) <o 2f> = —l (mod AO, 

(15.5.3) co 2P+l = 1 (mod M ) . 
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The same congruences are true, a fortiori, to any modulus r which 
divides M. 

Suppose that 


M =p\p2...q\qi... 

is the expression of M as a product of rational primes, /?,• being a prime 
5/i ± 1 (so that pt is the product of two conjugate primes of the field) and 
qi a prime 5/i ± 2. Since M = 2 (mod 5), there is at least one 
The congruence 

of = l(mod r), 

or P(x), is true, after (15.5.3), when x = 2 p+l , and the smallest positive 
solution is, by Theorem 69, a divisor of 2 P+1 . These divisors, apart from 
2 p+l , are 2 P , 2 P ~ 1 , . . . , and P(x) is false for all of them, by ( 1 5.5.2). Hence 
2 P+1 is the smallest solution, and, every solution is a multiple of this one. 
But 

co 2P, ~ l = 1 (mod//,), 
w 2 («/ +1 ) s {Ned) 2 = 1 (mod qj) , 

by (15.4.7) and (15.4.6). Hence pi — 1 and 2(qj + 1) are multiples of 2 p+l , 
and 

Pi = 2" + 'hi + 1, 

g, = Vkj - i, 

for some A, and kj. The first hypothesis is impossible because the right-hand 
side is greater than AdT; and the second is impossible unless 

kj— 1, qj—M. 


Hence M is prime. 

The test in Theorem 259 applies only when p = 3 (mod 4). The sequence 

4,14,194,..'. 

(constructed by the same rule) gives a test (Verbally identical) for any p. In 
this case the relevant field is k(y/3). We have selected the test in Theorem 
259 because the proof is slightly simpler. 
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To take a trivial example, suppose p = 1,M P — 127. The numbers r m 
of Theorem 259, reduced (mod M), are 

3, 7, 47, 2207 = 48, 2302 = 16, 254 = 0, 

and 127 is prime. If p = 127, for example, we must square 125 residues, 
which may contain as many as 39 digits (in the decimal scale). Such com- 
putations were, at one time, formidable, but quite practicable, and it was 
in this way that Lucas showed Mi 27 to be prime. The construction of elec- 
tronic digital computers enabled the tests to be applied to M p with larger 
p. These computers usually work in the binary scale in which reduction 
to modulus 2" — 1 is particularly simple. But their great advantage is, of 
course, their speed. Thus Mi 9937 was tested in about 35 minutes, in 1971, 
by Tuckerman on an IBM 360/91 . 

15.6. General remarks on the arithmetic of quadratic fields. The 

construction of an arithmetic in a field which is not simple, like k{^(— 5)} 
or £(V10), demands new ideas which (though they are not particularly 
difficult) we cannot develop systematically here. We add only some mis- 
cellaneous remarks which may be useful to a reader who wishes to study 
the subject more seriously. 

We state below three properties, A, B, and C, common to the ‘simple’ 
fields which we have examined. These properties are all consequences of 
the Euclidean algorithm, when such an algorithm exists, and it was thus 
that we proved them in these fields. They are, however, true in any simple 
field, whether the field is Euclidean or not. We shall not prove so much as 
this; but a little consideration of the logical relations between them will be 
instructive. 

A. If a and f$ are integers of the field, then there is an integer 8 with the 
properties 

(Ai) 8\a, 8\0, 

and 

(Aii) $i|a . 8\\fi -*■ 5i|5. 

Thus 8 is the highest, or ‘most comprehensive’, common divisor (a, fi) 
of a and /?, as we defined it, in k(i), in § 12.8. 

B. If a and are integers of the field, then there is an integer 8 with the 
properties 

(Bi) 


& la, S \fi: 
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and 

(B ii) 8 is a linear combination of a and p; there are integers k and p 
such that 


kcc -j- pP — 8. 


It is obvious that B implies A; (B i) is the same as (A i), and a 8 with the 
properties (B i) and (B ii) has the properties (A i) and (A ii). The converse, 
though true in the quadratic fields in which we are interested now, is less 
obvious, and depends upon the special properties of these fields. 

There are ‘fields’ in which ‘integers’ possess a highest common divisor in sense A but 
not in sense B. Thus the aggregate of all rational functions 


R(x,y) = 


p (*> y) 

Q(x,y) 


of two independent variables, with rational coefficients, is a field in the sense explained at 
the end of § 14.1. We may call the polynomials P(x,y ) of the field the ‘integers’, regarding 
two polynomials as the same when they differ only by a constant factor. Two polynomials 
have a greatest common divisor in sense A; thus x andy have the greatest common divisor 
1. But there are no polynomials P(x,y ) and Q(x,y) such that 


xP(x,y) +yQ(x,y) = 1. 


C. Factorization in the field is unique: the field is simple. 
It is plain that B implies C; for (B i) and (B ii) imply 

8y\ay, 8y\Py, kay + pPy=8y, 


and so 

(15.6.1) (<*Y,Py) = 8y; 

and from this C follows as in § 12.8. 

That A implies C is not quite so obvious, but may be proved as follows. 
It is enough to deduce (15.6. 1) from A. Let 

(ay,Py) = A. 

Then 

<5|a.<$|0 -► 8y\ay .8y\Py, 


and so, by (A ii), 


8y\A. 
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A = 8yp, 


say. But A| ay, A\fiy and so 


Sp\a, 8p\p ; 

and hence, again by (A ii), 5p|S. 

Hence p is a unity, and A = 8y. 

On the other hand, it is obvious that C implies A; for 8 is the product 
of all prime factors common to a and That C implies B is again less 
immediate, and depends, like the inference from A to B, on the special 
properties of the fields in question.* 

15.7. Ideals in a quadratic field. There is another property common 
to all simple quadratic fields. To fix our ideas, we consider the field k(i), 
whose basis (§ 14.3) is [1, *]. 

A lattice A is* the aggregate of all points 11 

ma + nfi, 

a and being the points P and Q of § 3.5, and m and n running through 
the rational integers. We say that [a, fi] is a basis of A, and write 

A = [a,P\, 

a lattice will, of course, have many different bases. The lattice is a modulus 
in the sense of § 2.9, and has the property 

(15.7.1) p € A . a € A -► mp + na e A 
for any rational integral m and n. 

Among lattices there is a sub-class of peculiar importance. Suppose that 
A has, in addition to (15.7. 1), the property 

(15.7.2) y e A — ► iy € A. 

t In fact both inferences depend on just those arguments which are required in the elements of die 
theory of ideals in a quadratic field. 

* See § 3.5. There, however, we reserved the symbol A for the principal lattice. 

II We do not distinguish between a point and the number which is its affix in the Argand diagram. 
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y € A — ► py € A 

for every integer p of k(i)', all multiples of points of A by integers of k(f) 
are also points of A. Such a lattice is called an ideal. If A is an ideal, and 
p and a belong to A, then pp + vo belongs to A: 

(15.7.3) peA.aeA^pp + voeA 

for all integral p and v. This property includes, but states much more than, 
(15.7.1). 

Suppose now that A is an ideal with basis [a, 0], and that 

(a,P)=8. 

Then every point of A is a multiple of 8 . Also, since 8 is a linear combination 
of a and ft, 8 and all its multiples are points of A. Thus A is the class of 
all multiples of 8; and it is plain that, conversely, the class of multiples of 
any 8 is an ideal A. Any ideal is the class of multiples of an integer of the 
field, and any such class is an ideal. 

If A is the class of multiples of p, we write 

A = {p}. 

In particular the fundamental lattice, formed by all the integers of the field, 

is {!}• 

The properties of an integer p may be restated as properties of the ideal 
{p}. Thus or |p means that {p} is a part of {a}. We can then say that ‘ {p} 
is divisible by {a} \ and write 


{cr}\{p}. 


Or again we can write 


{or}|p,p = 0(mod {a}), 

these assertions meaning that the number p belongs to the ideal {or}. In 
this way we can restate the whole of the arithmetic of the field in terms of 
ideals, though, in k(i), we gain nothing substantial by such a restatement. 
An ideal being always the class of multiples of an integer, the new arithmetic 
is merely a verbal translation of the old one. 
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We can, however, define ideals in any quadratic field. We wish to use the 
geometrical imagery of the complex plane, and we shall therefore consider 
only complex fields. 

Suppose that k(y/m) is a complex field with basis [1, o>].t We may define 
a lattice as we defined it above in k(i), and an ideal as a lattice which has 
the property 

(15.7.4) y € A -► coy e A, 

analogous to (15.7.2). As in k(i), such a lattice has also the property 
(15.7.3), and this property might be used as an alternative definition of 
an ideal. 

Since two numbers a and /3 have not necessarily a ‘greatest common 
divisor’ we can no longer prove that an ideal r has necessarily the form 
{p}; any {p} is an ideal, but the converse is not generally true. But the 
definitions above, which were logically independent of this reduction, are 
still available; we can define 


sir 

as meaning that every number of r belongs to s, and 

p == 0 (mod s) 

as meaning that p belongs to s. We can thus define words like divisible, 
divisor, and prime with reference to ideals, and have the foundations for 
an arithmetic which is at any rate as extensive as the ordinary arithmetic of 
simple fields, and may perhaps be useful where such ordinary arithmetic 
fails. That this hope is justified, and that the notion of an ideal leads to a 
complete re-establishment of arithmetic in any field, is shown in system- 
atic treatises on the theory of algebraic numbers. The reconstruction is as 
effective in real as in complex fields, though not all of our geometrical 
language is then appropriate. 

An ideal of the special type {p} is called a principal ideal; and the fourth 
characteristic property of simple quadratic fields, to which we referred at 
the beginning of this section, is 

D. Every ideal of a simple field is a principal ideal. 

This property may also be stated, when the field is complex, in a simple 
geometrical form. In k(i) an ideal, that is to say a lattice with the property 

t (o = ,/m when m ^ 1 (mod 4). 
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(15.7.2), is square', for it is of the form {p}, and may be regarded as the 
figure of lines based on the origin and the points p and ip. More generally 
E. Ifm < 0 and k(y/m) is simple, then every ideal of k(Jm) is a lattice 
similar in shape to the lattice formed by all the integers of the field. 

It is instructive to verify that this is not true in ^{^(—5)}. The lattice 

mot + nfi = m .3 + n{ — 1+ V( — 5)} 
is an ideal, for co = 5) and 

coa = a + 3 p, cop = —2a — p. 
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But, as is shown by Fig. 7 (and may, of course, be verified analytically), 
the lattice is not similar to the lattice of all integers of the field. 


15.8. Other fields. We conclude this chapter with a few remarks about 
some non-quadratic fields of particularly interesting types. We leave the 
verification of most of our assertions to the reader. 

(i) The field k(*J 2 + /)• The number 


— V2 + / 


satisfies 


& 4 -2# 2 +9 = 0, 

and the number defines a field which we denote by k{j2 -f- i ) . The numbers 
of the field are 

(15.8.1) £ = r + si + t<j2 + uiy/2, 
where r, s, t, u are rational. The integers of the field are 

(15.8.2) £ — 0 + bi + c^/2 di*j2 , 

where a and b are integers and c and d are either both integers or both 
halves of odd integers. 

The conjugates of $ are the numbers £ 1 , £ 2 , £ 3 , formed by changing the 
sign of either or both of i and y/2 in ( 1 5.8.1) or (1 5.8.2), and the norm Af£ 
of £ is defined by 

N$ = ££ i £ 2 £ 3 . 

Divisibility, and so forth, are defined as in the fields already considered. 
There is a Euclidean algorithm, and factorization is unique.^ 

(ii) The field k(y/2 + ^3). The number 

# = V2 + V3, 


satisfies the equation 


- 10# 2 + 1 = 0 . 


t Theorem 215 stands in the field as stated in § 12.8. The proof demands some calculation. 
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The numbers of the field are 

£ = r + sjl + tjl + Uy/6, 

and the integers are the numbers 

£ = a. + byj'l c^/3 + d^6, 

where a and c are integers and b and d are, either both integers or both halves 
of odd integers. There is again a Euclidean algorithm, and factorization is 
unique. 

These fields are simple examples of ‘biquadratic’ fields. 

(iii) The field k(e^ ni ). The number e* nt satisfies the equation 

— — 1 = ft 4 + ft 3 + ft 2 + ft + 1 = 0 . 
ft - 1 

The field is, after k(i) and k(p), the simplest ‘cyclotomic’ fields 
The numbers of the field are 

£ =r + j# + tft 2 + uft 3 , 

and the integers are the numbers in which r, s, t, u are integral. The 
conjugates of £ are the numbers £i, £ 2 , £ 3 , obtained by changing ft into 
ft 2 , ft 3 , ft 4 , and its norm is 


^£=££ 1*2*3. 

There is a Euclidean algorithm, and factorization is unique. 

2 

The number of unities in k(i) and k(p) is finite. In k{e* ni ) the number 
is infinite. Thus 


(\ +ft)\(ft + ft 2 + ft 3 + i? 4 ) 

and ft + ft 2 + ft 3 + ft 4 = — 1 so that 1 + ft and all its powers are unities. 

It is plainly this field which we must consider if we wish to prove 
‘Fermat’s last theorem’, when n = 5, by the method of § 13.4. The 
proof follows the same lines, but there are various complications of 
detail. 

t The field k(d) with d a primitive nth root of unity, is called cyclotomic because 0 and its powers 
are the complex coordinates of the vertices of a regular n - agon inscribed in the unit circle. 
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The field defined by a primitive nth root of unity is simple, in the sense 
of § 14.7, when - * - 


n = 3, 4, 5, 8. 


NOTES 

§ 15.5. Lucas stated two tests for the primality of M p , but his statements of his theorems 
vary, and he never published any complete proof of either. The argument in the text is due 
to Western, Journal London Math . Soc. 7 (1932), 130-7. The second theorem, not proved 
in the text, is that referred to in the penultimate paragraph of the section. Western proves 
this theorem by using the field k(^/3). Other proofs, independent of the theory of algebraic 
numbers, have been given by D. H. Lehmer, Annals of Math. (2) 31 (1930), 419-48, and 
Journal London Math . Soc . 10 (1935), 162-5. 

Professor Newman drew our attention to the following result, which can be proved by a 
simple extension of the argument of this section. 

Let h < 2 m be odd, M = 2 m h - 1 = ±2 (mod 5) and 

R\ = o? h + a> 2h , Rj = - 2(J Ss 2). 

Then a necessary and sufficient condition for M to be prime is that 

R m - 1 = 0 (mod M). 

This result was stated by Lucas [Amer. Journal of Math . 1 (1878), 310], who gives a 
similar (but apparently erroneous) test for numbers of the form N = h2 m + 1 . The primality 
of the latter can, however, be determined by the test of Theorem 102, which also requires 
about m squarings and reductions (mod N). The two tests would provide a practicable means 
of seeking laige prime pairs ( p,p + 2). 

§§ 15.6-7. These sections have been much improved as a result of criticisms from 
Mr. Ingham, who read an earlier version. The remark about polynomials in § 15.6 is due to 
Bochner, Journal London Math. Soc. 9 (1934), 4. 

§ 15.8. There is a proof that k(eS nl ) is Euclidean in Landau, Vorlesungen , iii. 228-31. 

The list of fields k(e 27ll ! m ) with the unique factorization property has been completely 
determined by Masley and Montgomery ( J . ReineAngew. Math. 286/287 (1976), 248-56). 
If m is odd, the values m and 2m lead to the same field. Bearing this in mind there are 
exactly 29 distinct fields for m ^ 3, corresponding to 

m = 3, 4, 5, 7, 8, 9, 1 1 , 1 2, 1 3, 1 5, 1 6, 1 7, 1 9, 20, 2 1 , 24, 25, 27, 28, 

32, 33, 35, 36, 40, 44, 45, 48, 60, 84. 


t e* ni = e* nl = ^ is a number of k{*j2 4- «*). 
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THE ARITHMETICAL FUNCTIONS <f>(n), pin), 
d(n),cr(n), r(n ) 

16.1. The function <t>(n). In this and the next two chapters we shall 
study the properties of certain ‘arithmetical functions’ of n, that is to say 
functions f (n) of the positive integer n defined in a manner which expresses 
some arithmetical property of n. 

The function <p(n) was defined in § 5.5, for n > 1, as the number of 
positive integers less than and prime to n. We proved (Theorem 62) that 


(16.1.1) 


= n 



This formula is also an immediate consequence of the general principle 
expressed by the theorem which follows. 

Theorem 260. If there are N objects, of which N a have the property 
a, Np have P,..., N a p have both or and P,..., N a p Y have a , P, and y , . . . , 
and so on, then the number of the objects which have none of a, p,y,. . . 

is 


(16. 1 .2) N — N a — Np — • • • + N a p + • • • — N a py — • • • . 

Suppose that O is an object which has just k of the properties a, p, 

Then O contributes 1 to N. If k ^ 1,0 also contributes 1 to k of N a , 
Np, . . . , to \k(k- 1) of N a p, . . . , to 


k(k- \)(k-2) 

1.2.3 

of N a py, . . . , and so on. Hence, if k ^ 1, it contributes 


1 — k 


k(Jk — 1) 
1 .2 


k(k - l)(/fc - 2) 
1.2.3 


+ ... = (1 - 1 )* = 0 


to the sum (16.1.2). On the other hand, if k = 0, it contributes 1. Hence 
(16.1.2) is the number of objects possessing none of the properties. 

The number of integers not greater than n and divisible by a is 
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If a is prime to b, then the number of integers not greater than n, and 
divisible by both a and b, is 



and so on. Hence, taking a, y, ... to be divisibility by a, b, c, . . . , we 
obtain 


Theorem 261. The number of integers, less than or equal to n, and not 
divisible by any one of a coprime set of integers a,b,..., is 

w-EG]*E[s]— 


If we take a,b,... to be the different prime factors p,p\ ... of n, we 
obtain 


(16.1.3) 


♦w-n-I^+E-V- 

*-! P ^ pp 1 



which is Theorem 62. 

16.2. A further proof of Theorem 63. Consider the set of n rational 
fractions 

L 

(16.2.1) - (1 < h ^ n). 

n 

We can express each of these fractions in ‘irreducible’ form in just one way, 
that is, 

h _ a 
n = d’ 

where d\n and 

(16.2.2) (a,d) = 1, 


and a and d are uniquely determined by h and n. Conversely, every fraction 
a/d, for which d\n and (16.2.2) is satisfied, appears in the set (16.2.1), 
though in general not in reduced form. Hence, for any function F(x), we 
have 


E 



(16.2.3) 
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Again, for a particular d, there are (by definition) just <f>(d) values of a 
satisfying (16.2.2). Hence, if we put F(x ) = 1 in (16.2.3), we have 

n = y^<f>(d). 
d |n 

16.3. The Moblus function. The Mobius function pin) is defined as 
follows: 

(i) Md) = l; 

(ii) pin) = 0 if n has a squared factor; 

(iii) pip\pi. • -Pk) = (— 1)* if all the primes p\,p 2 , • • ,Pk are different. 
Thus /a (2) = — 1, /a (4) = 0, /a (6) = 1. 

Theorem 262. pin) is multiplicative J 

This follows immediately from the definition of pin). 

From (16.1.3) and the definition of pin) we obtain 

0(») = »E 1 f ! = EX<O = E‘Mj) = £ 

d\n d\n d\n dd f =n 

Next, we prove 
Theorem 263: 

!>(</) = 1 in= 1), £>(</) = () in > 1). 

d\n d\n 

Theorem 264. If n > 1 , and k is the number of different prime factors 
of n, then 

£ \n(d)\ = 2 *. 

d\n 

In fact, if k ^ 1 and n = p° x . . -p a f , we have 

p( d ) = 1 + M(Pi) + + * * * 

i i',7 

= 1 — ^ ~ (3) 4 = (1 — 1)* = 0, 

t See §5.5. 

t A sum extended over all pairs d, d! for which dd' = n. 
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while, if n = l, pin) — 1. This proves Theorem 263. The proof of Theo- 
rem 264 is similar. There is an alternative proof of Theorem 263 depending 
on an important general theorem. 

Theorem 265. Iff in) is a multiplicative function of n, then so is 

g(n) = ]T/( d ). 

• d\n 

If ( n , n') = 1 , d\n, and d f \n', then id, d') = 1 and c = dd' runs through 
all divisors of nn' . Hence 

g(nn') = = J2 fW) 

citin' dingin' 

= 52 -/^ = g(")g(n'). 

d\n d'W 

To deduce Theorem 263 we write / in) = pin), so that 

g(n) = ^pid). 

d\n 


Theng(l) = 1, and 


gip m ) = 1 + pip) = 0 
when m > 1. Hence, when n = p a x x . . .p° h > 1, 

gin) = gip a i l )giP 2 2 ) . . . = 0 . 

16.4. The MSbius inversion formula. In what follows we shall make 
frequent use of a general ‘inversion’ formula first proved by Mobius. 

Theorem 266. If 

g (n) = jy>d), 

d\n 


m = E ^ Q) = E ■ 

</|/J 


then 
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In fact 


2>w)sQ) -l>«> 2><o = 

d\n d\n c|g c</|n 

c|n </|5 

The inner sum here is 1 if n/c = 1, i.e. if c = n, and 0 otherwise, by 
Theorem 263, so that the repeated sum reduces to f(n). 

Theorem 266 has a converse expressed by 

Theorem 267: 

fin ) = = 

d\n d\n 

The proof is similar to that of Theorem 266. We have 

I» = E/G) = EEO‘> 

</|h </|/i c|5 

= E" (s) 8(c > = E*< c > E*0 = 8(n) 

«/|»i c|n rf|2 

Ifweputg(n) = /iinTheorem267,anduse(16.3.1), so that f(n) = <f>(ji), 
we obtain Theorem 63. 

As an example of the use of Theorem 266, we give another proof of 
Theorem 110. 

We suppose that d\p — 1 and c\d, and that x (c) is the number of roots 
of the congruence x d = 1 (mod p) which belong to c. Then (since the 
congruence has d roots in all) 


£x(0 =d\ 

c\d 


from which, by Theorem 266, it follows that 
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16.5. Further inversion formulae. There are other inversion formulae 
involving p(n), of a rather different type. 

Theorem 268. If 

c « = E f G) 

n= 1 

for all positive xj then 


[x] 

F« = £/ t (»)C0. 

«= 1 


For 


lx] M [*/«l . r . 

n—\ m = 1 

= E F (f ) E /*(»)*= ^(*), 

!<*<[*] «|/t 


by Theorem 263. There is a converse, viz. 
Theorem 269: 


M M 

F(x) = ^,(»)fi0^ GW = ^F0, 

w=l n— 1 

This may be proved similarly. 

Two further inversion formulae are contained in 

Theorem 270: 


00 


00 


g(x) = ^2f(mx) =/(*) = y^n(n)g(nx). 


m=\ 


n= 1 


* An empty sura is as usual to be interpreted as 0. Thus G(x) = 0 if 0 < x < 1. 

* If mn = k then n\k , and k runs through the numbers 1, 2, . . . , [x]. 
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The reader should have no difficulty in constructing a proof with the help 
of Theorem 263; but some care is required about convergence. A sufficient 
condition is that 


£|/(m»*)| =£</(*) \f(kx)\ 
m,n k 

should be convergent. Here d(k) is the number of divisors of 

16.6. Evaluation of Ramanujan’s sum. Ramanujan’s sum c„(m ) was 
defined in § 5.6 by 

( hm 
— 

(*,»)= i 

We can now express c„(m) as a sum extended over the common divisors 
of m and n. 

Theorem 271: 



c„(m) = 


d\m y d\n 


If we write 


g(n) = 



/(«) = 



( A , b )=1 


(16.2.3) becomes 


g(n) = ^f{d). 
d\n 


By Theorem 266, we have the inverse formula 

(16.6.2) /(*) = 2>Q)«W) 


t See § 16.7. 
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<■«« E'O-E'O E'(5) 


l^h^n 
(hji )= 1 


We now take F( x) = e(mx). In this event. 


by (16.6.1), while 


f(n) = c„(m) 






which is n or 0 according as n|m or n { m. Hence (16.6.2) becomes 


c„(m)= £ 




Another simple expression for c„(m) is given by 
Theorem 272. if (/*, m) = a and n = aN, tAen 


c„ (m) = 


A*(W(*0 

<HN) 


By Theorem 271, 


c n (jn) -^dp Q) = ^2 d ^( Nc ) = ]£ ^(^Vc). 


Now p(Nc) = p(N)p(c) or 0 according as (7/, c) = 1 or not. Hence 

*« = «*(") E ^=^w(>-E;+E^- -). 


c|fl 

(c.AT)=l 


where these sums run over those different p which divide a but do not 
divide N. Hence 


c n ( m ) = ap (N) 


n ('-;)■ 


p\a,p\N 
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But, by Theorem 62, 




£ n HHn (* 







and Theorem 272 follows at once. 

When m — 1 , we have c„(l) = /x(/i), that is 

(16.6.4) n(n) = e 

(h.n )= 1 



16.7. The functions d{ri) and a*(/i). The function d(n) is the number 
of divisors of n, including 1 and n, while a * (ri) is the sum of the &th powers 
of the divisors of n. Thus 


°k(n) = ^ , d{ri) = 5^1, 

and J(/i) = cro(«). We write a (n) for cr\ (n), the sum of the divisors of n. 
If 


n =p a \Pl 


Pi 


at 


then the divisors of n are the numbers 


where 


P b \P*2 



0 < hi <-ai, 0 < h2 ^ a2, •••» 0 < h/ ^ a/. 

There are 

(Ol + l)(a 2 + l)...(«/+l) 
of these numbers. Hence 


l 

d(n ) = I~[(o/+ 1). 

i=i 


Theorem 273: 
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More generally, if k > 0, 

0k(n) = E P b \ k P? k * • • pT 

b\ =0 62=0 b{=0 

1=1 


Hence 

Theorem 274: 



In particular. 
Theorem 275: 


o(n) = 



16.8. Perfect numbers. A perfect number is a number n such that 
a(n ) = 2 n. In other words a number is perfect if it is the sum of its 
divisors other than itself. Since 1 + 2 + 3 = 6, and 


1 4- 2 + 4 + 7 + 14 = 28, 

6 and 28 are perfect numbers. 

The only general class of perfect numbers known occurs in Euclid. 
Theorem 276. If2 n+i — 1 is prime, then 2”(2" +1 — 1) is perfect. 

Write 2" +1 — 1 = p,N = 2 n p. Then, by Theorem 275, 

<r(A0 = (2" +1 - 1 )(p + 1) = 2 n+l (2 n+1 - 1) = 2N, 
so that N is perfect. 

Theorem 276 shows that to every Mersenne prime there corresponds a 
perfect number. On the other hand, if N = 2 n p is perfect, we have 

<x(N) = (2” +1 - \)(p + 1) = 2 n+l p 
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and so 


p = 2 n+l - 1. 

Hence there is a Mersenne prime corresponding to any perfect number of 
the form 2 n p. But we can prove more than this. 

Theorem 277. Any even perfect number is a Euclid number, that is to 
say of the form 2"(2" +1 — 1), where 2" +1 — 1 is prime. 

We can write any such number in the form N = 2 n b, where n > 0 and 
b is odd. By Theorem 275, a in) is multiplicative, and therefore 

a (AO = o(2 n )a(b) = (2 n+1 - l)o(b). 

Since N is perfect, 

o(N) = 2 N = 2 n+l b; 


and so 

b 2" +1 - 1 
ofb) ~ 2"+ 1 ' 

The fraction on the right-hand side is in its lowest terms, and therefore 
b = (2" +1 - l)c, o(b ) = 2 n+l c, 
where c is an integer. 

If c > 1 , b has at least the divisors b, c, 1 , so that 

o(b) ^ b + c + 1 = 2" +1 c + 1 > 2" +1 c = o(b ), 
a contradiction. Hence c = 1, = 2"(2" +1 — 1), and 

o(2 n+l - 1) = 2 n+l . 

But, if 2” +1 — 1 is not prime, it has divisors other than itself and 1, and 

<t(2" +1 - 1) > 2 n+l . 

Hence 2 n+l - 1 is prime, and the theorem is proved. 

The Euclid numbers corresponding to the Mersenne primes are the only 
perfect numbers known. It seems probable that there are no odd perfect 
numbers, but this has not been proved. The most that is known in this 
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direction is that any odd perfect number must be greater than lO 200 , that it 
must have at least 8 different prime factors and that its largest prime factor 
must be greater than 1001 10.* 

16.9. The function r(n). We define r(n) as the number of representa- 
tions of n in the form 


n=A 2 + B 2 , 

where A and B are rational integers. We count representations as distinct 
even when they differ only ‘trivially’, i.e. in respect of the sign or order of 
A and B. Thus 


0 = 0 2 + 0 2 , r(0) = 1; 

1 = (±l) 2 + 0 2 =0 2 + (±l) 2 , r(l) = 4; 

5 = (±2) 2 + (±1) 2 = (±1) 2 + (±2) 2 , r( 5) = 8. 

We know already (§ 15.1) that r(n) = 8 when n is a prime Am A- 1; the 
representation is unique apart from its eight trivial variations. On the other 
hand, r{ri) = 0 when n is of the form Am + 3. 

We define x («). for n > 0, by 

*</>) = 0 (2 |n), x(») = (-l)* < " _1) (2 f n). 

Thus x(n) assumes the values 1, 0, — 1,0, 1,. . . for n = 1,2,3, Since 

jO ' - 1) - \{n- 1) - \(n' - 1) = \(n- l)(n' - 1) = 0 (mod 2) 
When n and n' are odd, x («) satisfies 

x(«»') = xWx(«') 

for all n and n ' . In particular x («) is multiplicative in the sense of § 5.5. 

It is plain that, if we write 

(16.9.1) S(n) = 

d\n 


then 

(16.9.2) 

t See end of chapter notes. 


S(n) = d\(ri) - di{n). 
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where d\ ( n ) and dj ( n ) are the numbers of divisors of n of the forms 4m + 1 
and 4m + 3 respectively. 

Suppose now that 

(16.9.3) n = 2 a N = 2 a pv = 2“ 

where p and q are primes 4m + 1 and 4m + 3 respectively. If there are no 
factors q, so that is ‘empty’, then we define v as 1. Plainly 

8(n ) = 8(N). 

The divisors of N are the terms in the product 

(16.9.4) (1 +/?+••• +p r ) (1 + q H + <f). 

A divisor is 4m + 1 if it contains an even number of factors q, and 4m + 3 
in the contrary case. Hence 8(N) is obtained by writing 1 for p and — 1 for 
q in (16.9.4); and 

(16.9.5) UN) = n <r + 1) n ( 1 + 2~ 1> * ) ' 

If any s is odd, i.e. if v is not a square, then 

8(n ) = 8(N ) = 0 ; 


while 

«(B) = «(AT) = I1(r+l) = <*0i) 

if v is a square. 

Our object is to prove 

Theorem 278. Ifn ^ 1, then 

r(n) = 48 (n). 

We have therefore to show that r{ri) is 4 d (p) when v is a square, and 
zero otherwise. 
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16.10. Proof of the formula for r(n). We write (16.9.3) in the form 


„={(i+ o(i - or n {( ° + w>(a " bi)}r n ^ 


where a and b are positive and unequal and 

p = a 1 + b 2 . 

This expression of p is unique (after § 15.1) except for the order of a and b. 
The factors 


1 ± *, a ± bi, q 


are primes of k(i). 
If 


n = A 2 + B 2 = (A + Bi)(A - Bi), 


then 

A + Bi = /'(l + 0“* (1 - 0“ 2 FI {(a + bi)n (fl ~ bi)n > n ^ > * 

a - Bi = r\ i + /)“• (i - o“ 2 n - *o n (« + w)" 2 } n 

where 

1 = 0, 1,2, or 3, ai+a 2 =«, rj + /*2 = r, 5i+52=-s- 

Plainly s\ = 52 . so that every s is even, and v is a square. Unless this is so, 
there is no representation. 

We suppose then that 


v =n« i =n« 2si 

is a square. There is no choice in the division of the factors q between 
A +Bi and A — Bi. There are 

4(or + 1) ]"[(/• +1) 

choices in the division of the other factors. But 

1 ~i 
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is a unity, so that a change in ct\ and c *2 produces no variation in A and B 
beyond that produced by variation of t. We are thus left with 

4II0-+ !>-«<,) 

possibly effective choices, i.e. choices which may produce variation in A 
and B. 

The trivial variations in a representation n = A 2 + B 2 correspond (i) to 
multiplication of A + Bi by a unity and (ii) to exchange of A + Bi with its 
conjugate. Thus 

1 (A + Bi) =A+ Bi , i(A + Bi) = -B + Ai, 

i 2 (A + Bi) = —A — Bi, i 3 (A + Bi) = B — Ai, 

and A — Bi, —B — Ai, —A + Bi,B + Ai are the conjugates of these four 
numbers. Any change in t varies the representation. Any change in the r\ 
and ?*2 also varies the representation, and in a manner not accounted for by 
any change in t ; for 

/'(l + i) ai (1 - 1 )° 2 f] {(a + bi)* (a - bi)*} 

= iV(l + i)«i (1 - i) a 2 ]~[ {(a + bi/i ( a - bi/ 2 } 

is impossible, after Theorem 215, unless r\ — rj and ri = There are 
therefore 4 d(fi) different sets of values of A and B, or of representations 
of n; and this proves Theorem 278. 


NOTES 

§ 16.1. The argument follows P61ya and Szegdo, Nos. 21, 25. Theorem 260 is widely 
known as the Inclusion-Exclusion Theorem. 

§§ 16.3-5. The function fx(n) occurs implicitly in the work of Euler as early as 1748, 
but Mdbius, in 1832, was the first to investigate its properties systematically. See I ^uidau, 
Handbuch, 567-87 and 901. 

§ 16.6. Ramanujan, Collected papers, 180. Our method of proof of Theorem 271 was 
suggested by Professor van der Pol. Theorem 272 is due to Holder, Prace Mat. Fiz. 43 
(1936), 13-23. See also Zuckerman, American Math. Monthly, 59(1952), 230 and Anderson 
and Apostol, Duke Math. Joum. 20 (1953), 211-16. 

§§ 16.7—8. There is a very full account of the history of the theorems of these sections 
in Dickson, History, i, chs. i-ii. References to the theorems referred to at the end of § 16.8 
are given by Kishore {Math. Comp. 3 1 (1977), 274-9). 

t Change of r\ into r^, and into r\ (together with corresponding changes in f,ai, <*2) changes 
A + Bi into its conjugate. 
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Euler showed that any odd perfect number must take the form// 1 q . . . q^f r with primes 
p 9 qu • • and with a = p s 1 (mod 4). It is now (2007) known that an odd perfect 
number would have to exceed 10 300 (Brent, Cohen, and te Riele, Math. Comp . 57 (1991), 
857-68). Moreover, Nielsen has announced (http://arxiv.org/pdf7math/0602485) that an odd 
perfect number must have at least 9 distinct prime factors. It is known that the largest prime 
factor must exceed 10 7 (Jenkins, Math. Comp . 72 (2003), no. 243, 1549-1 554 (electronic)). 
Indeed Goto and Ohno have announced that this bound can be increased to 10 8 . Neilsen 
{Integers 3 (2003), A 14, (electronic)) has also shown that an odd perfect number n with k 

distinct prime factors must satisfy n < 2 4 . 

§ 16.9. Theorem 278 was first proved by Jacobi by means of the theory of elliptic 
functions. It is, however, equivalent to one stated by Gauss, DA. y § 1 82; and there had been 
many incomplete proofs or statements published before. See Dickson, History , ii, ch. vi, 
and Bachmann, Niedere Zahlentheorie , ii, ch. vii. 
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GENERATING FUNCTIONS OF ARITHMETICAL 

FUNCTIONS 

17.1. The generation of arithmetical functions by means of Dirichlet 

series. A Dirichlet series is a series of the form 

(17.1.1) ™ = 

/!= 1 

The variable s may be real or complex, but here we shall be concerned 
with real values only. F(s), the sum of the series, is called the generating 
function of a„. 

The theory of Dirichlet series, when studied seriously for its own sake, 
involves many delicate questions of convergence. These are mostly irrel- 
evant here, since we are concerned primarily with the formal side of the 
theory; and most of our results could be proved (as we explain later in 
§ 17.6) without the use of any theorem of analysis or even the notion of 
the sum of an infinite series. There are, however, some theorems which 
must be considered as theorems of analysis; and, even when this is not so, 
the reader will probably find it easier to think of the series which occur as 
sums in the ordinary analytical sense. 

We shall use the four theorems which follow. These are special cases of 
more general theorems which, when they occur in their proper places in 
the general theory, can be proved better by different methods. We confine 
ourselves here to what is essential for our immediate purpose. 

(l)If^ a„n~ s is absolutely convergent for a given s, then it is absolutely 
convergent for all greater s. This is obvious because 

|a«W _,y2 | ^ |o! n «“' Sl | 


when n ^ 1 and S 2 > s\. 

(2) If £ a „n~ s is absolutely convergent for s > sq then the equation 

(17.1.1) may be differentiated term by term, so that 

(17.1.2) F'(s ) = -Y y l ° gn 

n s 

for s > so. To prove this, suppose that 

•SO < SQ + 5 = Si ^ S < J2- 
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17.1] 

Then log n < K(8)n? s , where K (6) depends only on 8, and 


a„ log n 


n° 


^K(8) 


<*n 


jj^O+3^ 


for all s of the interval (si , 52 ). Since 



is convergent, the series on the right of ( 1 7. 1 .2) is uniformly convergent in 
(?i , S 2 ), and the differentiation is justifiable. 

(3) If 


for s > so, then a„ = 0 for all n. To prove this, suppose that a m is the first 
non-zero coefficient. Then 


(17.1.3) 0 - F(s) = a m m- s j 1 + * 

[ a m \ m ) 

say. If sq < si < s, then 



and 


\G(s)\ ^ 


1 / m-h 1 \ 
l«ml \ m ) 


~(ss\) 


m 




which tends to 0 when s -► 00 . Hence 


V"' l aw +*l 
~ + *) 51 ’ 


|1 + G(i)| > ] 

for sufficiently large s; and (17.1.3) implies a m = 0, a contradiction. 



320 


GENERATING FUNCTIONS OF 


[Chap. XVn 


It follows that if 


s = ]T>n 5 

for 5 > 5i, then a„ = for all n. We refer to this theorem as the ‘uniqueness 

theorem*. 

(4) Two absolutely convergent Dirichlet series may be multiplied in a 
manner explained in § 17.4. 

17.2. The zeta function. The simplest infinite Dirichlet series is 
(17.2.1) = 


It is convergent for s > 1, and its sum £(s) is called the Riemann zeta 
function. In particular'*' 


(17.2.2) 


00 1 

«*>-Ei- 


n=\ 



If we differentiate (17.2.1) term by term with respect to s, we obtain 
Theorem 279: 


f '(») = - E <*>»• 

The zeta function is fundamental in the theory of prime numbers. Its 
importance depends on a remarkable identity discovered by Euler, which 
expresses the function as a product extended over prime numbers only. 

Theorem 280: If s > 1 then 

f(s) =nnp- 

p y 


?(2 n) is a rational multiple of n 2n for all positive integral n. Thus f(4) = ggflr 4 , and generally 


n\-^ 2n lBn 7 r 2n 

a2n) -^, r n * 


where B n is Bernoulli's number. 
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Since 2, we have 

(17.2.3) . * — = 1 +P~ S +P~ 2s + ••• 

1 — p s 

for s > 1 (indeed for s > 0). If we take p = 2, 3,. . ., P, and multiply the 
series together, the general term resulting is of the type 

2 -a 2 s 3 -a 3 s p-aps _ 

where 


n = 2 a2 3 fl3 . . . P° p (a 2 ^ 0,a 3 ^ 0, . . . ,a P > 0). 


A number /i will occur if and only if it has no prime factors greater than P, 
and then, by Theorem 2, once only. Hence 



1 


1 ~P~ S 



the summation on the right-hand side extending over numbers formed from 
the primes up to P. 

These numbers include all numbers up to P, so that 


o < f>- s 

n= 1 (P) P+1 


and the last sum tends to 0 when P -*■ oo. Hence 


00 1 

lim y ^n~ s = lim FT , 

“ P— ►OO " P—*oo I 1 1 — p s 

n=l (P) p^P y 


the result of Theorem 280. 

Theorem 280 may be regarded as an analytical expression of the 
fundamental theorem of arithmetic. 


173. The behaviour of f(s) when s — ► 1. We shall require later to 
know how £(s) and £'(s) behave when s tends to 1 through values greater 
than 1. 
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We can write £( 5 ) in the form 
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oo ~ oo W+1 


(17.3.1) £ (s) = 52 n ~ S = J x ~ S(bc + " JC "' r ) dx - 


Here 


oo 


/ 


x~ s dx = 


s - r 


since 5 > 1 . Also 


0 < 


X 

n~ s -x~ s = J sr s ~ x dt< jp. 


if n < x < n+ 1, and so 


/»+i 

°</, 


(n 5 —x s ) dx < 


} Y 


and the last term in (17.3. 1) is positive and numerically less than s £ n 2 . 
Hence 


Theorem 28 1 : 


Also 


£(*) = — ^- + 0(1). 
s — 1 


log £ (s) = log — — + log{ 1 + 0(s - 1)}, 

5 — 1 


and so 

Theorem 282: 


log £( 5 ) = log — + 0(s - 1). 
5—1 
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We may also argue with 

00 

-f'C S) = J^/TMog/i 
i 

°° oo "+ 1 

= I x~ s logxdx + I (n~ s logn—x~ s \ogx)dx 

1 1 n 

much as with £(s), and deduce 
Theorem 283: 

In particular, 

£( s ) ~ _J L_ . 

s — 1 

This may also be proved by observing that, if s > 1, 

(1 - 2 1 - 5 K(s) = l' s + 2 s + 3~ s + 2(2~ s + 4“ s + 6~ s + • • • ) 

= l" s - 2~ s + 3“ s , 

and that the last series converges to log 2 for s = 1. Hence^ 

(* - = (i - 2 lw )f - lo ®^jog2 = *• 

17.4. Multiplication of Dirichlet series. Suppose that we are given a 
finite set of Dirichlet series 

(17.4.1) Y2a n n~ s , ^p n n~ s , ..., 

t We assume here that 

iimi 

5—*“ I ^ /r* ^ n 

whenever the series on the right is convergent, a theorem not included in those of § 17. 1 . We do not 
prove this theorem because we require it only for an alternative proof. 
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and that we multiply them together in the sense of forming all possible 
products with one factor selected from each series. The general term 
resulting is 


a u u s . fi v v s . y^co 5 ... = ctuPvYco • • • n s , 

where n = uvw. ... If now we add together all terms for which n has a given 
value, we obtain a single term Xn"~ s where 

(17.4.2) Xn = ^ * a uPvYco • • • • 

uvw...=n 

The series J2Xn*~ s , with Xn defined by (17.4.2), is called the formal 
product of the series (17.4.1). 

The simplest case is that in which there are only two series (17.4.1), 
53 and 53 Pv^~ s . If (changing our notation a little) we denote their 
formal product by 53 Yn^~ s , then 

(17.4.3) Yn = ^ ] a uPv = ^ 1 &d fin/d = } ' a n/dPd> 

uv=n d\n d\n 

a sum of a type which occurred frequently in Ch. XVI. And if the two given 
series are absolutely convergent, and their sums are F(s) and G(s), then 

F(s)G(s) = J2 a u u ~ S J2fr v ~ S = 'EctuMuv)- 5 , 

U V M,V 

= ^2 n ~ S XI ’= Yn»~ S , 
n uv—n 

since we may multiply two absolutely convergent series and arrange the 
terms of the product in any order that we please. 

Theorem 284. If the series 

F(s) = a u u ~ s , G(s) = ^2 PvV~ s 

are absolutely convergent, then 

F{s)G{s) = Y J Ynn~\ 


where Yn is defined by (17.4.3). 
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Conversely, if 


then it follows from the uniqueness theorem of § 17.1 that 8 n — y„. 

Our definition of the formal product may be extended, with proper 
precautions, to an infinite set of series. It is convenient to suppose that 

ofi = Pi = Y\ = • • = 1- 

Then the term 

OluPvYw • • 

in ( 1 7.4.2) contains only a finite number of factors which are not 1 , and we 
may define Xn by (17.4.2) whenever the series is absolutely convergent, t 
The most important case is that in which /( 1 ) = 1 ,/(«) is multiplicative, 
and the series (17.4.1) are 

(17.4.4) 1 +/(p)p- s +f(p 2 )p- 2s + ■■■ +f(p°)p-‘ ! + ■■■ 

for p = 2, 3, 5,...; so that, for example, a u is f(2 a ) when u = 2 a and 0 
otherwise. Then, after Theorem 2, every n occurs just once as a product 
uvw. . . with a non-zero coefficient, and 

Xn=f(pV)f(p?)...=f(n) 

when n — p^'pj 2 ... . It will be observed that the series (17.4.2) reduces to 
a single term, so that no question of convergence arises. 

Hence 

Theorem 285. If f( 1) = 1 and f(n) is multiplicative, then 

jy(n)n- s 

is the formal product of the series ( 1 7.4.4). 

In particular, J2 n ~ s is the formal product of the series 

1 +p~ s +p~ 2s + .... 

t Wc must assume absolute convergence because we have not specified the order in which the terms 
are to be taken. 
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Theorem 280 says in some ways more than this, namely that £(?), the 
sum of the series £ n~ s when s > 1, is equal to the product of the sums 

of the series 1 + p~ s + p -2s The proof can be generalized to cover the 

more general case considered here. 

Theorem 286. Iff (n) satisfies the conditions of Theorem 285, and 
(17.4.5) Y. 

is convergent, then 

F (s) = f(n)n~ s = I"! f 1 +f(P)P~ S +/(/> 2 )/> -2 * + •••}• 

p 


We write 


F P (s) = l +f(p)p-‘ +f(p 2 )p~ 2s + ■■■; 

the absolute convergence of the series is a corollary of the convergence of 
(17.4.5). Hence, arguing as in § 17.2, and using the multiplicative property 
of f (n), we obtain 


= ^f(n)n s . 
p^p (P) 


Since 


OO 


Yf(n)n~ s - Y/(n)n-’ 
«= 1 (P) 


OO 


< l/MI" * -► 0 

p + 1 


the result follows as in § 17.2. 

17.5. The generating functions of some special arithmetical func- 
tions. The generating functions of most of the arithmetical functions which 
we have considered are simple combinations of zeta functions. In this 
section we work out some of the most important examples. 


Theorem 287: 
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This follows at once from Theorems 280, 262, and 286, since 


oo 


*+n(p 2 )p 25 + ...} = ^2/x(n)n s . 

CW p 


n= 1 


Theorem 288: 


(is- 1 ) 


OO 


-E 


•Hn) 


^ n* 


is > 2 ). 


By Theorem 287, Theorem 284, and (16.3.1) 


gfr ~ 1) 
(is) 


OO 00 / v00 1 

~r? ^ ~ ^ W/ " ^ * 

/I=l rt=l /!=1 </|rt /!=1 


Theorem 289: 


oo 


c*w-E^ <•*»■ 


«=i 




Theorem 290: 


OO 


C(5K(J-l) = y;^ (*>2) 
“ n s 

n=\ 


These are special cases of the theorem 
Theorem 291: 


OO 


( (j)C(j - k) = is > 1, j > k + 1). 

n=\ 


In fact 


OO | OO lr oo . OO , . 

n=l r=l n=l </|n n=l 


by Theorem 284. 
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Theorem 292: 


c„(m) 

m s ~ l S(s) ~ 2-0 n s 

n—\ 


By Theorem 271, 

c n (jn) = Q) d- ii(d')d; 

d\m 9 d\n d\m,dd'=n 


and so 


oo 


E 


Cnirri) 

n* 


E E 

«=1 d\m,dd'=n 


d' s d s 



d' s 




1 

d s ~ l ' 


Finally 


J|/ti </|m 

In particular. 

Theorem 293: 

E cJjn) _ 6 a(m) 
n 2 ~ tv 2 m ' 

n 

17.6. The analytical interpretation of the Mobius formula. Suppose 
that 


gin) = Y/(d), 
d\n 


and that F(s) and G(s) are the generating functions of f (n) and g(n). Then, 
if the series are absolutely convergent, we have 


= E^ E ^ = E h X>> = E^r = <** 

»=1 n=l . /»=! d\n n= 1 
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where 

h(n ) = ‘ 

d\n 

It then follows from the uniqueness theorem of § 17.1 (3) that 

Kn) -f(n), 

which is the inversion formula of Mobius (Theorem 266). This formula then 
appears as an arithmetical expression of the equivalence of the equations 

G(J) = f (s)F{s), F(s) = 

We cannot regard this argument, as it stands, as a proof of the Mobius for- 
mula, since it depends upon the convergence of the series for F(s). This 
hypothesis involves a limitation on the order of magnitude of /(«), and 
it is obvious that such limitations are irrelevant. The ‘real’ proof of the 
Mobius formula is that given in § 16.4. 

We may, however, take this opportunity of expanding some remarks which we made in 
§ 1 7. 1 . We could construct a formal theory of Dirichlet series in which ‘analysis’ played no 
part. This theory would include all identities of the ‘Mobius’ type, but the notions of the 
sum of an infinite series, or the value of an infinite product, would never occur. We shall 
not attempt to construct such a theory in detail, but it is interesting to consider how it would 
begin. 

We denote the formal series £ o n n~ s by A, and write 

A = y]a„n~ s . 

In particular we write 

/ = 1 . l“* + 0.2 - ' r + 0.3 _i + ---, 

Z= 1 . 1 _5 + 1 .2“'+ 1 . 3 _i + - -, 

M = ji(l)l -s + fi(2)2~ s + nO)3~ s + 


By 


A — B 


we mean that a„ = b„ for all values of n. 



330 


GENERATING FUNCTIONS OF 


[Chap. XVII 


The equation 


A x B = C 

means that C is the formal product of A and B, in the sense of § 17.4. The definition may 
be extended, as in § 17.4, to the product of any finite number of series, or, with proper 
precautions, of an infinity. It is plain from the definition that 

AxB = BxA, A x B x C = (A x B) x C = A x (B x C), 


and so on and that 


The equation 


means that 


A x I = A. 

A x Z = B 


bn = ^,a d . 

d\n 

Let us suppose that there is a series L such that 

Zxi = /. 


Then 


A=AxI = Ax(ZxL) = (A xZ)xL=BxL, 


i.e. 


a n — ) ] bd^n/d- 

d\n 

The Mobius formula asserts that l n = n(n), or that L — M, or that 


(17.6.1) 


Z x M = /; 


and this means that 

d\n 


is 1 when n = 1 and 0 when n > 1 (Theorem 263). 

We may prove this as in § 16.3, or we may continue as follows. We write 

Pp = 1 ~ P~ s > Qp = 1 +P -25 H > 


where p is a prime (so that P p , for example, is the series A in which a\ = 1, a p = — 1, and 
the remaining coefficients are 0); and calculate the coefficient of ri~ s in the formal product 
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of P p and Q p . This coefficient is 1 if n = 1 , 1 — 1 = 0 if n is a positive power of p, and 0 in 
all other cases; so that 


Pp x Qp — / 


for every p. 

The series P p , Q p , and / are of the special type considered in § 17.4; and 


z = []e P , A/ = n p p> 
Z*M = Y\Q p xY\P p , 


while 

ntox^-ri 7 - 7 

But the coefficient of h~ s in 

(Ql x C?3 x Qs x . . .) x (Pj x Pi x P 5 x . . .) 
(a product of two series of the general type) is the same as in 


Ql x Pi x Qi x P 3 x Q s x Z> 5 x . . . 


or in 


(02 x Pi) x (Qi x Pi) x (05 x P 5 ) x . . . 

(which are each products of an infinity of series of the special type); in each case the Xn of 
§ 17.4 contains only a finite number of terms. Hence 


z x m = n e, * n = n to * 'w = n 7 = 7 

It is plain that this proof of (17.6.1) is, at bottom, merely a translation into a different 
language of that of § 16.3; and that, in a simple case like this, we gain nothing by the 
translation. More complicated formulae become much easier to grasp and prove when 
stated in the language of infinite series and products, and it is important to realize that we 
can use it without analytical assumptions. In what follows, however, we continue to use the 
language of ordinary analysis. 

17.7. The function A(/i). The function A(/i), which is particularly 
important in the analytical theory of primes, is defined by 

A(n) = logp (»=//"), 

A(/i)=0 (n*p m ), 

i.e. as being log p when n is a prime p or one of its powers, and 0 otherwise. 
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From Theorem 280, we have 


log ?(*) = £ log 


Differentiating with respect to s, and observing that 


d i 1 

— log 

as 1 — p 


—S 


log/? 

>-r 


we obtain 
(17.7.1) 


log/? 

£(*) p P* ~ 1 


The differentiation is legitimate because the derived series is uniformly 
convergent for s > 1 + 8 > l.t 
We may write (17.7. 1) in the form 


£'(*) 

£(*) 


OO 


£log/>X> 


and the double series £ £/* ms log p is absolutely convergent when s > 1 . 
Hence it may be written as 

J^/?”** log p = A (n)n~ s , 

p/n 

by the definition of A (n). 

Theorem 294: 


“|^ = I] A(n)n5 ( J > !>• 

Since 


-£'(5) 


OO 


logn 


n= 1 




t The nth prime p n is greater than n, and the series may be compared with 53 n ~ s log »• 
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OO 


E 


A(n) 

n 5 



log 71 

n s 


H(n) y> log 7i. 

Z-r ’ 
/!=1 /!= 1 


and 



From these equations, and the uniqueness theorem of § 17.1, we deduce* 
Theorem 295: 

A(n) = X^G) logrf. 

</|n 

Theorem 296: 

logn = £>«). 

J|/t 

We may also prove these theorems directly. If n = then 

X A id) = 

d\n p°\n 

The summation extends over all values of p, and all positive values of a 
for which p a \n, so that log p occurs a times. Hence 

X log p = x a {o %p — lo s IV = log n - 

p»l« 

This proves Theorem 296, and Theorem 295 follows by Theorem 266. 
Again 


d_ (j_i = m = l liMl 

ds \ f(.s) J ? 2 (s) £(s) 1 C(s) } ’ 

t Compare § 17.6. 
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so that 


oo 


E 


li(n) log n 

n* 


OO 


~E 

n=l 


At(«) 


n s 


OO 


E 


A(w) 

n s 


Hence, as before, we deduce 
Theorem 297: 


log n = 

d\n 


Similarly 



and from this (or from Theorems 297 and 267) we deduce 
Theorem 298: 


A(/i) = -^^(^)l°g d - 

d\n 


17.8. Further examples of generating functions. We add a few 
examples of a more miscellaneous character. We define dk(n) as the num- 
ber of ways of expressing n as the product of k positive factors (of which 
any number may be unity), expressions in which only the order of the 
factors being different is regarded as distinct. In particular, diin) — d(n). 
Then 

Theorem 299: 


c*o> = E 


dk(n) 

n* 


(s > 1). 


Theorem 289 is a particular case of this theorem. 
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f(2j) 

C(s) 


n(r^)-n (■*?)" 

n('-?*£- ) 


oo 


__ ^ M**) 

“2_/ ^ ’ 
n=l 


where k(n) = (— 1 ) p ,p being the total number of prime factors of n, when 
multiple factors are counted multiply. Thus 

Theorem 300: 


?(2s) _ "t-* A.(n) 

J(s) ~ 


(s > 1). 


Similarly we can prove 
Theorem 301: 


tHs) _ ” 2 a, <”> 
£(2s) ~ n * 


(s> 1), 


w/zere <y(n) is the number of different prime factors of n. 

A number n is said to be squarefree t if it has no squared factor. If we 
write q(n) = 1 when n is squarefree, and q(n) = 0 when n has a squared 
factor, so that qin) = \p(n)\, then 


Cfr) 
U2 s) 




oo 


n= 1 


+*-) = E ^ 


(s > 1), 


by Theorems 280 and 286. Thus 
Theorem 302: 


C ( s ) _ <]( n ) _ yr^ Im( w )1 

f (2$) “ 2-* n* ~ n s 

n=\ n=l 


t Some writers (in English) use the German word ‘quadratfrei’. 
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More generally, if qk(n) = 0 or 1 according as n has or has not a &th 
power as a factor, then 

Theorem 303: 


£(f) 

?(fa) V 


(j > 1). 


Another example, due to Ramanujan, is 
Theorem 304: 


f 4 W WOI 2 

C(2j) 


(j > 1). 


This may be proved as follows. We have 

rt»> n i -p-* _ n i + P -’ 
f(2») y (1 -r') 4 p (i —p~ s ) 3 ' 


Now 


1 + * 
(1 - AT) 3 


= (1+i)(1+3x + & 2 +-) 


OO 


= 1 + 4x + 9x 2 H = ^2 (J + l) 2 **. 

1=0 


Hence 


!\s) 

((2s) 



The coefficient of n s , when n — Piffy • • • , is 

(/i + 1) 2 (/2 + 1) 2 ... = {^(/i)} 2 . 


by Theorem 273. 

More generally we can prove, by similar reasoning, 
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Theorem 305. I/s, s—a,s—b, and s—a—b are all greater than 1, then 

f (s)f (s - Qy - l>)f (s - a- b) _ o a (n)o b (n) 

f (2s — a — b) “ n s 

n=\ 


17.9. The generating function of r(n). We saw in § 16.10 that 

r(n) =4j2x(d), 

d \n 


where x(n) is 0 when n is even and (— l)2^ n_1 ^ when n is odd. Hence 


where 


E K”) __ 4 y^ l yp Xin) 

n s Is 2-^, n s 


= 4$(s)L(s), 


L(s) '= l -s — 3 -s -f- 5 -s , 

if 5 > 1. 

Theorem 306: 

E ~ ( s )L(s) (s > i). 

z — ' n s 

The function 


n(s)= \- s -2- s + 3~ s 


is expressible in terms of £(s) by the formula 

r](s) = (1 - 2 1 - 5 )? (5); 
but L(s), which can also be expressed in the form 


iw =n(r^)- 


is an independent function. It is the basis of the analytical theoty of the 
distribution of primes in the progressions 4w+l and 4m+3. 
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17.10. Generating functions of other types. The generating functions 
discussed in this chapter have been defined by Dirichlet series; but any 
function 


F(s) = ^2a„u„(s) 

may be regarded as a generating function of a n . The most usual form of 
u„(s) is 


u H (s) = e~^ s , 

where k n is a sequence of positive numbers which increases steadily to 
infinity. The most important cases are the cases k n = log n and k„ — n. 
When k n - log n,u„(s ) = n~ s and the series is a Dirichlet series. When 
k n = n, it is a power series in 


x = e 


—S 


Since 


m s .n s = ( mn ) s , 


and 


x m £ m x m +n 


the first type of series is more important in the ‘multiplicative’ side of 
the theory of numbers (and in particular in the theory of primes). Such 
functions as 


^2 ^in)x n , ^2<(>(n)x n , ^2 M^x” 

are extremely difficult to handle. But generating functions defined by power 
series are dominant in the ‘additive’ theory.* 

Another interesting type of series is obtained by taking 


Unis) = 


e 


—ns 


1 - 


xT 

1 -x n ' 


t See Chs. XIX-XXI. 
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oo 


x? 


F(x) = 

/*=1 


and disregard questions of convergence, which are not interesting here.* 
A series of this type is called a 'Lambert series’. Then 


00 


00 


00 


F(X) = a n X ' nn = £ bNxN ’ 


n = 1 m=\ 


N—\ 


where 


oo 

b N = 

n \N 

This relation between the a and b is that considered in §§ 16.4 and 17.6, 
and it is equivalent to 


!(s)f(s)=g(s), 

where / ( 5 ) and g(s) are the Dirichlet series associated with a„ and b„. 
Theorem 307. If 

/(s) = ^2 a " n ~ 5 ' s^) = ^2 b » n ~ S » 

then 

fw = Y. a «Yfr« = 'H b ’' x " 

if and only if 

S(s)f(s) =g(s). 

Iff (s) = £ fx{n)n~ s , g(s) = 1 , by Theorem 287. If f(s) = £ <t> in)n~ s , 
g(s) = ((s- 1 ) = £^, 
by Theorem 288. Hence we derive 

t All the series of this kind which we consider are absolutely convergent when 0 ^ x < 1 . 
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Theorem 308: 


2^ i -x n ~ x 


Theorem 309: 


{ p ( ri)X n X 

4- 1 - = (1 -x) 2 ' 


Similarly, from Theorems 289 and 306, we deduce 


Theorem 310: 


2 3 

x Jt z x J 


£ d( „ )x » = — + T - I j + t— ^ + 


Theorem 311: 


00 / r r 3 r 5 

Y> )x » = 4 (___ + _ 


Theorem 31 1 is equivalent to a famous identity in the theory of elliptic 
functions, viz. 

Theorem 312: 


(1 + 2x + 2x 4 + 2x 9 H ) 


= 1+4 


( X X 3 X 5 \ 

\ - x \-X 3 + l - X s ) 


In fact, if we square the series 


1 + 2x + 2x 4 + 2x 9 + • . - = £ x ^ , 
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17.10] 


the coefficient of x* is r(n), since every pair (mi, m2) for which m\+m\ = n 
contributes a unit to it.t 


NOTES 

§ 1 7. 1 . There is a short account of the analytical theory of Dirichlet series in Titchmarsh, 
Theory of j functions, ch. ix; and fuller accounts, including the theory of series of the more 
general type 


(referred to in § 17.10) in Hardy and Riesz, The general theory of Dirichlet’s series 
(Cambridge Math. Tracts, no. 18, 1915), and Landau, Handbuch, 103-24, 723-75. 

§ 17.2. There is a large literature concerned with the zeta function and its application to 
the theory of primes. See in particular the books of Ingham and Landau, Titchmarsh, The 
Riemann zeta-function (Oxford, 1951) and Edwards, Riemann s zeta-function (New York, 
Academic Press, 1974), the last especially from the historical point of view. 

For the value of f (2 n) see Bromwich, Infinite series, ed. 2, 298. 

§ 17.3. The proof of Theorem 283 depends on the formulae 

X 

0 < n~ s log n — x~ s log* = j t~ s ~ l {s log t —\)dt < ^ log(« + 1), 

n 

valid for3^n<x<n+l ands > 1. 

There are proofs of the theorem referred to in the footnote to p. 247 in Landau, Handbuch, 
106-7, and Titchmarsh, Theory of_ functions, 289-90. 

§§ 17.5-10. Many of the identities in these sections, and others of similar character, 
occur in Pdlya and Szegd, Nos. 38-83. Some of them go back to Euler. We do not attempt 
to assign them systematically to their discoverers, but Theorems 304 and 305 were first 
stated by Ramanujan in the Messenger of Math. 45 (1916), 81-84 {Collected papers, 133-5 
and 185). 

§ 17.6. The discussion in small print was the result of conversation with Professor 
Harald Bohr. 

§ 17.10. Theorem 312 is due to Jacobi, Fundamenta nova (1829), § 40 (4) and § 65 (6). 


t Thus 5 arises from 8 pairs, viz. (2, 1), (1, 2), and those derived by changes of sign. 
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18.1. The order of d(rt). In the last chapter we discussed formal 
relations satisfied by certain arithmetical functions, such as d(n), cr(«), 
and <p(n). We now consider the behaviour of these functions for large val- 
ues of n, beginning with d(n). It is obvious that d{n) ^ 2 when n > 1, 
while d(n) = 2 if n is a prime. Hence 

Theorem 313. The lower limit of d(n ) as n — ► oo is 2: 


lim d(n) = 2. 

n-*-oo 


It is less trivial to find any upper bound for the order of magnitude of d(n). 
We first prove a negative theorem. 

Theorem 314. The order of magnitude of d (n) is sometimes larger than 
that of any power of log n: the equation 


(18.1.1) 


d(n) = 0{(logn) A } 


is false for every A.t 
If n = 2 m , then 


d(n) = m + 1 


log w 
log 2 


If n = (2 . 3) m , then 


and so on. If 


and 


/ ^ A < /+ 1 


« = (2 • 3 ...pi+\) m , 
t The symbols O, o, ~ were defined in § 1.6. 
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rf <„) = C + ~ ( | 0ga '°3 g - ; p -^r > K(W +t - 

where K is independent of n. Hence (1 8. 1 . 1 ) is false for an infinite sequence 
of values of n. 

On the other hand we can prove 


Theorem 315 : 


d(n) = 0(n d ) 


for all positive 8. 

The assertions that d(n) = 0(n s ), for all positive 8, and that d(n) = 
o(n s ), for all positive 8, are equivalent, since n s ' = o(n s ) when 0 < 8' < 8. 
We require the lemma 

Theorem 316. If Jin) is multiplicative, and f(p m ) — ► 0 as p m — ► oo, 
then /(«) -*■ 0 as n — ► oo. 

Given any positive €, we have 

(i) |/ (p m )| < A for all p and m, 

(ii) \f(p m )\ < 1 if p m >B, 

(iii) \f(p m )\ <c if p m > N(e), 

where A and B are independent of p, m, and e, and N(e) depends on e only. 
If 


then 


n =pTp? 


n a ' 
•rr » 


f(n)=f(p a l ')f(p?)...f(p a /). 

Of the factors p° x ,p < ^, not more than C are less than or equal to B , C 
being independent of n and e. The product of the corresponding factors 
f(p°) is numerically less than A c , and the rest of the factors of f(n) are 
numerically less than 1 . 

The number of integers which can be formed by the multiplication of 
factors p a ^ N (e) is M(e), and every such number is less than P(e), M(e) 
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and P(e) depending only on €. Hence, if n > P(e) there is at least one 
factor p a of n such that p a > N(e) and then, by (iii), 

\f(p a )\ <€. 

It follows that 

1/00 1 < a c €. 


when n > P{e), and therefore that / (n) — ► 0. 

To deduce Theorem 315, we take f(n) = n~ s d(n). Then f(ri) is 
multiplicative, by Theorem 273, and 

f( m, = 1 < }m_ = 2 lp g Vj < 2 log 0 

pmS ^ pmS pmS | Q g p ^ log 2 ( p m ) 2 * * S 

when p m —*■ oo. Hence f{n) —*■ 0 when n — ► oo, and this is Theorem 315 
(with o for O). 

We can also prove Theorem 315 directly. By Theorem 273, 


(18.1.2) 

Since 


we have 



a8 log 2 < e aSlog2 = 2 aS < p aS , 


a+ 1 

pa* 


^ l+ ^S ^ 1 + 


1 

8 log 2 



We use this in (18.1.2) for those p which are less than 2 1 / 5 ; there are less 
than 2 1 /* such primes. If p^ 2 x / & , we have 


Hence 

(18.1.3) 


e a + 1 a + 1 

P ^ 2 » <|5 ^ ^ 


2 a 


^ < n, “P (ai^) " exp (S) = oa) - 


This is Theorem 315. 



18.1 (317)] 


ARITHMETICAL FUNCTIONS 


345 


We can use this type of argument to improve on Theorem 315. We 
suppose € > 0 and replace & in the last paragraph by 

_ (1 + \ € ) log 2 
a log log n 

Nothing is changed until we reach the final step in (18.1.3) since it is here 
that, for the first time, we use the fact that S is independent of n. This time 
we have 

/ d(n)\ 2 1 /“ _ (log/i) 1 ^ 1+ 5^ log log/i 6 log 2 log n 

° 8 \ n a ) < a log 2 (1 + log 2 2 ^ 21oglogn 

for all n > no(t) (by the remark at the top of p. 9). Hence 


log<f(n) ^ a log /i + 


e log 2 log n 
2 log log n 


(1 + O log 2 log n 
log log n 


We have thus proved part of 

~ <2 1 *7 — logf/(n) log logn 

Theorem 317: lim - — — = log 2; 

logn 

that is, if € > 0 then 

d(n) < 2^ 1 lo ® n / lo ® lo ® n 


for all n > no(e) and 

(18.1.4) d(n) > 2^ — lo ® n / lo ® lo ® n 


for an infinity of values of n. 

Thus the true ‘maximum order’ of d(n) is about 

2log n/ log log n 


It follows from Theorem 315 that 

log (n) 
logn 


0 


and so 


d{ri) = w togrf(»)/log« = n *n f 
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where e„ -► 0 as n -* oo. On the other hand, since 

2 logn/ log logn _ w log 2 /loglog« 

and loglog n tends very slowly to infinity, €„ tends very slowly to 0. To put 
it roughly, d(n) is, for some n, much more like a power of n than a power 
of log n. But this happens only very rarely* and, as Theorem 313 shows, 
d(n ) is sometimes quite small. 

To complete the proof of Theorem 317, we have to prove (18.1.4) for a 
suitable sequence of n. We take n to be the product of the first r primes, so 
that 

n = 2. 3. 5. 7. . . P, d(n) = 2 r = 2 7r(/>) , 

where P is the rth prime. It is reasonable to expect that such a choice of n 
will give us a large value of d(n). The function 

= yj°gp 

is discussed in Ch. XXII, where we shall prove (Theorem 414) that 

t?(x) > Ax 

for some fixed positive A and all x ^ 2.* We have then 
AP < &(P) = ^2 log p = log n, 

p^p 

tt(P) log P = logP^T 1 ^ &(P) = logn, 

p^p 

and so 


log d(n) = 7 r(P) log 2 ^ 


log n log 2 
log/> 


log n log 2 
log logn — log /I 
(1 — e) logn log 2 
log logn 


for n > no(e). 


t See §22.13. 

t In fact, we prove (Theorem 6 and 420) that &(x) ~ x, but it is of interest that the much simpler 
Theorem 414 suffices here. 
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18.2. The average order of d(n). If f (n) is an arithmetical function 
and g(n) is any simple function of n such that 

(18.2.1) /( 1) +/( 2) + • • • +/(«) ~g( 1) + • • • +g(n), 

we say that / (/i) is of the average order of gin). For many arithmetical 
functions, the sum of the left-hand side of (18.2.1) behaves much more 
regularly for large n than does f(n) itself. For d(n), in particular, this is 
true and we can prove very precise results about it. 

Theorem 318: d(\) + d(2) H 1- d(n) ~ nlog/i. 

n 

Since log 1 + log2 H (- log/i ~ J log tdt ~ n log n, 

l 

the result of Theorem 318 is equivalent to 

d( 1) +d(2) -| \-d(n) ~ log 1 + log 2 H 1- log/i. 

We may express this by saying 

Theorem 319. The average order of d (n) is log n. 

Both theorems are included in a more precise theorem, viz. 

Theorem 320: 

d( 1) + d( 2) H \-d{n) = n\ogn + (2 y — l)n + 0(y/n), 

where y is Euler s constant f 

We prove these theorems by use of the lattice L of Ch. Ill, whose vertices 
are the points in the (x,y) -plane with integral coordinates. We denote by 
D the region in the upper right-hand quadrant contained between the axes 
and the rectangular hyperbola xy = n. We count the lattice points in D, 
including those on the hyperbola but not those on the axes. Every lattice 
point in D appears on a hyperbola 

xy = s (1 < j < n); 


t In Theorem 422 we prove that 




- log n = y + O 



where y is a constant, known as Euler's constant. 
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and the n um ber on such a hyperbola is d(s). Hence the number of lattice 
points in D is 

rf(l)-M(2) + ••■+<*(«). 

Of these points, n = [n] have the ^-coordinate 1, 

.x-coordinate 2, and so on. Hence their number is 

'■mm* 

= n log n + 0(h), 

since the error involved in the removal of any square bracket is less than 1 . 
This result includes Theorem 318. 

Theorem 320 requires a refinement of the method. We write 

u = [ y/n]. 


[jw] have the 
) + O(n) 


so that 


u 2 — n + 0( y/ri) = n + 0(ti ) 

and 

logu = log \y/n + 0(1) | = 2 logn + O 

In Fig. 8 the curve GEFH is the rectangular hyperbola xy = n, and the 
coordinates of A, B, C, D are (0, 0), (0, u), (u, u), (u, 0). Since (u+1) 2 > n, 
there is no lattice point inside the small triangle ECF\ and the figure is 
symmetrical as between x and y. Hence the number of lattice points in D is 
equal to twice the number in the strip between A Y and DF, counting those on 
DF and the curve but not those on AY, less the number in the square ADCB, 
counting those on BC and CD but not those on AB and AD; and therefore 

tM5] + - + G])-* 

= In ^1 + i + • • ■ + - n 4- 0(u). 

2 ( i + ‘ + ... + I) = 2 |og „ + 2 y+ 0 (i), 



Now 
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Fig. 8. 


so that 


= 2/i log « + (2 y — \)n + 0(u ) + O 
= n log n + (2y — \)n 4- O(v'w). 

Although 

1 " 

~y^ d(l) ~ log n, 

n ti 

it is not true that ‘most’ numbers /i have about log n divisors. Actually 
‘almost all’ numbers have about 

(log/i) log2 = (log n) 6 '“ 

divisors. The average log n is produced by the contributions of the small 
proportion of numbers with abnormally large d(n)J 


t ‘Almost all’ is used in the sense of § 1.6. The theorem is proved in § 22.13. 
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This may be seen in another way, if we assume some theorems of 
Ramanujan. The sum 

d 2 { \) + ...+d 2 (n) 
is of order «(log«) 22-1 = n(logn) 3 ; 

t/ 3 (l) H h 

is of order /i(logn) 23 “ 1 = n(\ogn) 7 ; and so on. We should expect these 
sums to be of order w(log h) 2 , /i(log ri ) 3 , . . . , if d(n) were generally of the 
order of log n. But, as the power of d(n) becomes larger, the numbers with 
an abnormally large number of divisors dominate the average more and 
more. 

18.3. The order of <r(n). The irregularities in the behaviour of a (n) are 
much less pronounced than those of d(n). 

Since 1 \n and n\n, we have first 

Theorem 321: 


<j(n) > n. 


On the other hand. 
Theorem 322: 


o(n) = 0(n l+s ) 


for every positive S. 
More precisely, 
Theorem 323: 


iK. ° (n) 


n log log n 


— e r 


We shall prove Theorem 322 in the next section, but must postpone the 
proof of Theorem 323, which, with Theorem 321, shows that the order of 
o' 00 is always ‘very nearly n\ to § 22.9. 
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As regards the average order, we have 

Theorem 324. The average order of o(n) is %jr 2 n. More precisely, 
cr(l) + a (2) H 1- o(n) = j^n 2 n 2 + 0(n\ogn). 


For 

(7(1) + ••• + o(n) = 

where the summation extends over all the lattice points in the region D of 
§ 18.2. Hence 


[;]([;]+•) 

x=l y^n/x x=\ 


1 = 1 


-1 E(= + «»)(; + 0(1)) + 

X=1 X— 1 \ X=1 / 


Now 


jc= 1 jc= 1 N N ' 


by (17.2.2), and 


Hence 


til = 0(log/i). 

X=1 


= n^rV + 0(n log n). 

l=\ 


In particular, the average order of o(ji) is g 7 r 2 nJ 
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18.4. The order of 0(n). The function <p(n) is also comparatively 
regular, and its order is also always ‘nearly n\ In the first place 

Theorem 325: 0(h) < n if n > 1. 

Next, if n = //", and p > l/e then 


0(«) = n 



> h( 1 — e). 


Hence 

Theorem 326: 




n 


There are also two theorems for 0(n) corresponding to Theorems 322 
and 323. 


Theorem 327: 


for every positive S. 
Theorem 328: 


0(n) 


n 


1-5 


oo 


^(njloglogn =e - r 
n 

Theorem 327 is equivalent to Theorem 322, in virtue of 
Theorem 329: 


<x(h)0(h) 
A < = 


< 1 


(for a positive constant A). 

To prove the last theorem we observe that, if n = fl/A then 


<*(n) = Yl 

p\n 


// a+1 - 1 
P~ 1 



1 - p - 0 - 1 

1 -P- 1 


0(h) =« nd ~ p *)• 

p\n 


and 
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p\n 


which lies between 1 and f"[0 — P~ 2 )^ It follows that a (n)/n and n/<t>(n ) 
have the same order of magnitude, so that Theorem 327 is equivalent to 
Theorem 322. 

To prove Theorem 327 (and so Theorem 322) we write 


f{n) = 


<p(n)' 


Then f(n) is multiplicative, and so, by Theorem 316, it is sufficient to 
prove that 


mS 


f(p m ) -> o 

when — > oo. But 

f(p m ) p m{X ~ S) p V p) * lP 

We defer the proof of Theorem 328 to Ch. XXII. 

18.5. The average order of 0(n). The average order of cp(n) is 6n/jt 2 . 
More precisely 


oo. 


Theorem 330: 


3n 2 


<J>(n) = 0(1) + 1- <p(n ) = — j + 0(n\ogn). 


it 1 


For, by (16.3.1), 


/w=l d\m dd'^n 

ln/d] 


n in/ai n / ~> \ 

- e -«> e ^ - 5 z: -«> ([=] * [5]) 

d=\ N d—\ d=\ ' 

t By Theorem 280 and (17.2.2), we see that the A of Theorem 329 is in fact 

{?(2)r‘ =6n~ 2 . 
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+ 0(n log ri) 


2 3/j2 

- ” ■ + 0(n) + 0(n log n) = + 0(n log n), 

2 £( 2 ) 7T* 


by Theorem 287 and (17.2.2). 

The number of terms in the Farey series is <I>(/i)+l, so that an 
alternative form of Theorem 330 is 


Theorem 331. The number of terms in the Farey series of order n is 
approximately 3n 2 /n 2 . 

Theorems 330 and 331 may be stated more picturesquely in the language 
of probability. Suppose that n is given, and consider all pairs of integers 
(p, q ) for which 


q> 0 , 1 ^ p ^ q ^ «, 

and the corresponding fractions p/q. There are 

xjr„ = \n(n + 1) ~ \n 2 

such fractions, and Xn, the number of them which are in their lowest terms, 
is d>(/i). If, as is natural, we define ‘the probability that p and q are prime 
to one another’ as 


Xn 

lim — , 

n-oo 


we obtain 

Theorem 332. The probability that two integers should be prime to one 
another is 6/tz 2 . 
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18.6. The number of squarefree numbers. An allied problem is that 
of finding the probability that a number should be ‘ squarefree i.e. of 
determining approximately the number Q(x) of squarefree numbers not 
exceeding x. 

We can arrange all the positive integers n ^ y 2 in sets Si, S 2 ,..., such 
that Sd contains just those n whose largest square factor is d 2 . Thus S\ is 
the set of all squarefree n ^ y 2 The number of n belonging to Sd is 

0® 

and, when d > y,Sd is empty. Hence 

d^y V 7 


and so, by Theorem 268, 


Qi/y = #*» [£1 = E (i + o(d) 

d^y L J d^y V 7 


d^y 

2 V 1 Vid) 

=y E + °W 

d^y 


= s £^ + o ( sz ^) + o W 

d=l \ d>y ) 

y 2 6 y 2 


Replacing y 2 by x, we obtain 

Theorem 333. The probability that a number should be squarefree is 
6/jt 2 : more precisely 


Q(x) — — y + O(yfx). 

Tt 1 


t Without square factors, a product of different primes: see § 17.8. 
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A number n is squarefree if pin) = ±1, or \piri)\ = 1. Hence an 
alternative statement of Theorem 333 is 

Theorem 334: 


X , 

X l^(>OI = ^ + O(Jx). 

/ 1=1 

It is natural to ask whether, among the squarefree numbers, those for 
which pin) = 1 and those for which pin) = —1 occur with about the 
same frequency. If they do so, then the sum 

X 

M(x) = 

n=l 

should be of lower order than x; i.e. 

Theorem 335: 

M(x) = o(x). 


This is true, but we must defer the proof until § 22. 17. 

18.7. The order of r(#»). The function r(n) behaves in some ways rather 
like d(ri), as is to be expected after Theorem 278 and (16.9.2). If n = 3 
(mod 4), then r(n) = 0. Ifn = (p\P 2 . . -pi+ \) m , and every p is 4 &+ 1 , then 
r(n) = Ad (n). In any case r(n) ^ 4d(n). Hence we obtain the analogues 
of Theorems 313, 314, and 315, viz. 

Theorem 336: 

limr(«) = 0. 

Theorem 337: 

r{n) = 0{(logn) A } 

is false for every A. 

Theorem 338: 


r(«) = 0(n s ) 


for every positive 8. 
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There is also a theorem corresponding to Theorem 317; the maximum 
order of r(n) is 

loan 

2 log log « . 

A difference appears when we consider the average order. 

Theorem 339. The average order of r(n) is 7t; i.e. 

,. r(l) + r(2) + - ••+/■(*) 

lim = 7i. 

n-*oo n 

More precisely 

(18.7.1) r(l)+r(2) +.. + /*(«) =nn + 0(Jn). 

We can deduce this from Theorem 278, or prove it directly. The direct 
proof is simpler. Since r(m), the number of solutions of or 2 +y 2 = m, is the 
number of lattice points of L on the circle x 2 +y 2 =m, the sum ( 1 8.7. 1 ) is 
one less than the number of lattice points inside or on the circle x 2 +y 2 = n. 
If we associate with each such lattice point the lattice square of which it is 
the south-west comer, we obtain an area which is included in the circle 

X 2 +y 2 = O + JT ) 2 

and includes the circle 

x 2 +y 2 = (V« - V 2 ) 2 ; 

and each of these circles has an area nn -I- O(^n). 

This geometrical argument may be extended to space of any number of dimensions. 
Suppose, for example, that rj (n) is the number of integral solutions of 

x 2 +y 2 +r 2 = n 

(solutions differing only in sign or order being again regarded as distinct). Then we can 
prove 

Theorem 340: 

rj(l) + r 3 ( 2 ) + •... + rs(ri) = %nrP- + 0(n). 
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If we use Theorem 278, we have 

M 

^2 r(v) = = 4 X(«)> 

l$V<Jt 1 </|v 

the sum being extended over all the lattice points of the region D of § 1 8.2. 
If we write this in the form 

4 Z X(h) J2 1=4 Z X(M) [^]’ 


we obtain 
Theorem 341: 


E '<•"<([;] -SMH-)- 

This formula is true whether x is an integer or not. If we sum separately 
over the regions ADFY and DFX of § 18.2, and calculate the second part 
of the sum by summing first along the horizontal lines of Fig. 8, we obtain 

4 Z X ( u ) [^] + 4 Z * <“>* 

U^y/X V^y/X y/X<U^X/V 


The second sum is 0(. y/x), since X( u )> between any limits, is 0 or ±1, 

and 


Z x(u) [~] = Z + ° ( ^ X) 

U^y/X U^y/X 



» (IA1) 

[■A] 


+ 0(y /X) 


) 


= X 




+ 0(^/x) = \nx + 0(y/x). 


This gives the result of Theorem 339. 
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NOTES 

§ 1 8. 1 . For the proof of Theorem 3 1 5 see Polya and Szeg 6 , No. 264. 

Theorem 317 is due to Wigert, Arkiv for matematik , 3, no. 18 (1907), 1-9 (Landau, 
Handbuch , 219-22). Wigert ’s proof depends upon the ‘prime number theorem’ (Theorem 
6 ), but Ramanujan ( Collected papers , 85-86) showed that it is possible to prove it in a more 
elementary way. Our proof is essentially Wigert ’s, modified so as not to require Theorem 6 . 

§ 18.2. Theorem 320 was proved by Dirichlet, Abhandl. Akad '. Berlin (1849), 69-83 
{Werke, ii. 49-66). 

A great deal of work has been done since on the very difficult problem (‘Dirichlet’s 
divisor problem’) of finding better bounds for the error in the approximation. Suppose that 
0 is the lower bound of numbers p such that 

d{ 1 ) + d{ 2 ) 4 h d{n) = nlogn + ( 2 y — 1 )n 4 - 0(n ^). 


Theorem 320 shows that 0 ^ j . Voronoi proved in 1903 that 0 ^ 3 , and van der Corput in 

1922 that# < and these numbers have been improved further by later writers. The cur- 

rent (2007) record is due to Huxley ( Proc . London Math. Soc . (3) 87 (2003), 591-609) and 
states that 0 ^ 3 ^. On the other hand. Hardy and Landau proved independently in 1915 
that 0 > The true value of 0 is still unknown. See also the note on § 18.7. 

As regards the sums d 2 ( 1 ) 4 h d 2 (n), etc., see Ramanujan, Collected papers , 1 33-5, 

and B. M. Wilson, Proc. London Math . Soc. (2) 21 (1922), 235-55. 

§ 18.3. Theorem 323 is due toGronwall, Trans. American Math. Soc. 14(1913), 113-22. 
Theorem 324 stands as stated here in Bachmann, Analytische Zahlentheorie , 402. The 
substance of it is contained in the memoir of Dirichlet referred to under § 1 8.2. The error term 
has been improved slightly to 0{n{ log n) 2 / 3 ) by Walfisz, Weylsche Exponentialsummen in 
der neueren Zahlentheorie (Berlin, 1 963). He similarly improved the error term in Theorem 
330 to 0(n(logn) 2 / 3 (log logn) 4 / 3 ). 

§§ 18.4-5. Theorem 328 was proved by Landau, Archiv d. Math. u. Phys. (3) 5 (1903), 
86-91 ( Handbuch , 216-19); and Theorem 330 by Mertens, Journal fur Math. 77 (1874), 
289-338 (Landau, Handbuch , 578-9). Dirichlet (1849) proved a slightly weaker form of 
Theorem 330, i.e. with error 0(n}+ e ) for any € > 0 (Dickson, History , i, 119). 

§ 18.6. Theorem 333 is due to Gegenbauer, Denkschriften Akad. Wien , 49, Abt. 1 (1885), 
37-80 (Landau, Handbuch , 580-2). The error term has been improved by various authors, 
the current (2007) record being 0( x e ), for any 0 > 5 ^, due to Jia (Sci. China Ser. A 36 
(1993), 154-169). 

Landau [Handbuch, ii. 588-90] showed that Theorem 335 follows simply from the 
‘prime number theorem’ (Theorem 6 ) and later [Sitzungsberichte Akad. Wien , 120, Abt.. 2 
(191 1), 973-88] that Theorem 6 follows readily from Theorem 335. Mertens conjectured 
that \M(x)\ ^ jc 1 / 2 for all x > 1. However this was disproved by Odlyzko and te Riele 
(J. Reine Angew. Math. 357 (1985), 138-160), who showed in fact that there are infinitely 
many integral x for which M(x) > *Jx, and similarly for which M(x) < —yfx. No specific 
example of such an x > 1 is known, and Odlyzko and te Riele suggest that there is no 
example below 10 20 , or even 10 30 . 

§ 18.7. For Theorem 339 See Gauss, Werke , ii. 272-5. 

This theorem, like Theorem 320, has been the starting-point of a great deal of modem 
work, the aim being the determination of the number G corresponding to the 0 of the note 
on § 18.2. The problem is very similar to the divisor problem, and the numbers 
occur in the same kind of way; but the analysis required is in some ways a little simpler. See 
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Landau, Vorlesungen, ii. 1 83-308. As with Theorem 320 the current (2007) record is due to 
Huxley (Proc. London Math. Soc. (3) 87 (2003), 591-609) and states again that 6 < 

The error term in Theorem 340 has been investigated by a number of authors. The best 
known result up to 2007 is due to Health-Brown ( Number theory in progress, Vol. 2, 883-92, 
(Berlin, 1999)), and states that the error is 0(n 6 ) for any 6 > ii. 

Atkinson and Cherwell (Quart. J. Math. Oxford, 20 ( 1 949), 65-79) give a general method 
of calculating the 'average order’ of arithmetical functions belonging to a wide class. For 
deeper methods, see Wirsing (Acta Math. Acad. Sci. Hungaricae 18 (1967), 41 1-67) and 
Hal&sz (ibid. 19(1 968), 365-403). 
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PARTITIONS 


19.1. The general problem of additive arithmetic. In this and the next 
two chapters we shall be occupied with the additive theory of numbers. The 
general problem of the theory may be stated as follows. 

Suppose that A or 


a\, 02,(13 , . . . 


is a given system of integers. Thus A might contain all the positive integers, 
or the squares, or the primes. We consider all possible representations of 
an arbitrary positive integer n in the form 


n = a,-, + ai 2 H + ai s . 


where s may be fixed or unrestricted, the a may or may not be necessarily 
different, and order may or may not be relevant, according to the particular 
problem considered. We denote by r(n) the number of such representations. 
Then what can we say about r(n)? For example, is r{n) always positive? 
Is there always at any rate one representation of every nl 

19.2. Partitions of numbers. We take first the case in which A is the set 
1, 2, 3, ... of all positive integers, s is unrestricted, repetitions are allowed, 
and order is irrelevant. This is the problem of ‘unrestricted partitions’. 

A partition of a number n is a representation of n as the sum of any 
number of positive integral parts. Thus 

5 = 4+1=3 + 2 = 34-1 + 1= 2 + 2+1 

=2+l+l+l = l+l+l+l+l 


has 7 partitions.* The order of the parts is irrelevant, so that we may, 
when we please, suppose the parts to be arranged in descending order of 
magnitude. We denote by p(n ) the number of partitions of n ; thus p(5) = 7. 

We can represent a partition graphically by an array of dots or ‘nodes’ 
such as 


t We have, of course, to count the representation by one part only. 
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A 

the dots in a row corresponding to a part. Thus A represents the partition 

7+4+3+3+1 


of 18. 

We might also read A by columns, in which case it would represent the 
partition 


5+4+4+2+ i +1+1 

of 1 8. Partitions related in this manner are said to be conjugate. 

A number of theorems about partitions follow immediately from this 
graphical representation. A graph with m rows, read horizontally, repre- 
sents a partition into m parts; read vertically, it represents a partition into 
parts the largest of which is m. Hence 

Theorem 342. The number of partitions of n into m parts is equal to the 
number of partitions of n into parts the largest of which is m. 

Similarly, 

Theorem 343 . The number of partitions of n into at most m parts is equal 
to the number of partitions of n into parts which do not exceed m. 

We shall make further use of ‘graphical’ arguments of this character, but 
usually we shall need the more powerful weapons provided by the theory 
of generating functions. 

19 . 3 . The generating function of p(n). The generating functions 
which are useful here are power series^ 

F(x) = y^f(n)x n . 

The sum of the series whose general coefficient is f(n) is called the 
generating function of f(n), and is said to enumerate f(n). 

t 


Compare § 17.10. 
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19.3] 


The generating function of p(n) was found by Euler, and is 

(19 - 3,) ^ = (1 — JTj ( 1 -JC 2 )(1 -JC 3 )~ = ‘ + Z P( " )X " 

We can see this by writing the infinite product as 

(1 +x+x 2 + •••) 


(1 +x 2 + x 4 H ) 

(1 + * 3 +x 6 + •••) 


and multiplying the series together. Every partition of n contributes just 1 
to the coefficient of x”. Thus the partition 

10 = 3 + 2 + 2 + 2+1 

corresponds to the product of* 3 in the third row, x 6 = x 2+2+2 in the second, 
and* in the first; and this product contributes a unit to the coefficient of* 10 . 

This makes (19.3.1) intuitive, but (since we have to multiply an infinity 
of infinite series) some development of the argument is necessary. 

Suppose that 0 < * < 1, so that the product which defines F(*) is 
convergent. The series 

1 + * + * 2 H , 1 + * 2 + * 4 H , •••, l+^+ot^H , 

are absolutely convergent, and we can multiply them together and arrange 
the result as we please. The coefficient of x n in the product is 


Pm(n), 

the number of partitions of n into parts not exceeding m. Hence 


(19.3.2) F m (x) = 


1 


OO 


(1 -*)(1 -* 2 )...(1 -x m ) 


= l + y^ f p m (n)x n . 

n=\ 


It is plain that 
(19.3.3) 


Pm(n ) </?(«), 
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that 

(19.3.4) Pm(n)=p(n) 
for n ^ m, and that 

(19.3.5) Pm(n) pin), 

when m -*■ oo, for every n. And 

m oo 

(19.3.6) F m (x) = 1 + '£ jP in)x n + 'JTpMxT. 

n—\ m + 1 

The left-hand side is less than F(x) and tends to F(x) when m —*■ oo. 
Thus 


m 

1 + y^p(n)x n < F m (x) < F(x), 

n= l 

which is independent of m. Hence £ pin)x n is convergent, and so, after 
(19.3.3), J2Pmin)x n converges, for any fixed x of the range 0 < x < 1, 
uniformly for all values of m. Finally, it follows from (19.3.5) that 

oo / oo \ 

1 -|- y^p(n)x n = Jim^ I 1 + Pm in) x” 1 = Jton^CO = Fix), 

m-^oo y n=1 / 

Incidentally, we have proved that 

1 

H%3.7) — 5- — 

(1 -x)(l -x 2 ) . . . (1 -x m ) 

enumerates the partitions of n into parts which do not exceed ni or (what 
is the same thing, after Theorem 343) into at most m parts. 

We have written out the proof of the fundamental formula (19.3.1) in 
detail. We have proved it for 0 < x < 1, and its truth for |x| < 1 follows at 
once from familiar theorems of analysis. In what follows we shall pay no 
attention to such ‘convergence theorems’,^ since the interest of the subject- 
matter is essentially formal. The series and products with which we deal 
are all absolutely convergent for small x (and usually, as here, for |x| < 1). 

t Except once in § 19.8, where again we are concerned with a fundamental identity, and once in 
§ 19.9, where the limit process involved is less obvious. 
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The questions of convergence, identity, and so on, which arise are trivial, 
and can be settled at once by any reader who knows the elements of the 
theory of functions. 

19.4. Other generating functions. It is equally easy to find the 
generating functions which enumerate the partitions of n into parts 
restricted in various ways. Thus 


enumerates partitions into odd parts; 


(19.4.2) 


1 

(1 -x 2 )(\ - x*)(\ - x 6 ) . . . 


partitions into even parts; 

(19.4.3) (1 + x)(l +x 2 )(l +x 3 )... 


partitions into unequal parts; 

(19.4.4) (1 +x)(l + x 3 )(l +x 5 )... 


partitions into parts which are both odd and unequal', and 


(1 -x)(l -x*)(l -x 6 )(l -x 9 )...’ 

where the indices are the numbers 5m + 1 and 5m + 4, partitions into parts 
each of which is of one of these forms. 

Another function which will occur later is 


(19.4.6) 


x* 

(1 -x 2 )(l -^)...(1 -X 2m Y 


This enumerates the partitions of n — N into even parts not exceeding 2m, 
or of \{n— N) into parts not exceeding m; or again, after Theorem 343, 
the partitions of j(n — N ) into at most m parts. 
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Some properties of partitions may be deduced at once from the forms of 
these generating functions. Thus 


(19.4.7) (1 +x)(l +x 2 )(l + x 3 )... 


1 — x 2 1 — x 4 1 — X 6 

7^7 

i 

(1 -x)(l -x 3 )(l -x 5 )... 


Hence 

Theorem 344. The number of partitions of n into unequal parts is equal 
to the number of its partitions into odd parts. 

It is interesting to prove this without the use of generating functions. 
Any number / can be expressed uniquely in the binary scale, i.e. as 

l = 2° + 2 b + 2 C (0<a<*<c...). t 

Hence a partition of n into odd parts can be written as 


n = l\ .1 + / 2 .3 + /3 .5 + • • ■ 

= (2 fll + 2 b ' + • • • )1 + (2 a2 +2 b2 + • • -)3 + (2° 3 + • • ■ )5 + ■ • • ; 


and there is a (1,1) correspondence between this partition and the partition 
into the unequal parts 


9^1 9*t 9<»2 ‘i 9*2 -j 9^3 « 9*3 < 


19.5. Two theorems of Euler. There are two identities due to Euler 
which give instructive illustrations of different methods of proof used 
frequently in this theory. 

Theorem 345: 

(l+x)(l+x 3 )(l+x 5 )... 

— 1 I X * 4 X 9 

+ 1 -x 2 + (1 -x 2 )(l -X 4 ) + (1 -x 2 )(l -x4)(l -X 6 ) 

t This is the arithmetic equivalent of the identity 

(1 +*)(! +x 2 )(l +* 4 )(1 + x *)...= — . 
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Theorem 346: 

(l+x 2 )(l+x 4 )(l+;c 6 )... 

X 2 X 6 X 12 

= + 1 - * 2 + (1 -x 2 )(l -x 4 ) + (1 -x 2 )(l -x^l -x 6 ) + “* ‘ 

In Theorem 346 the indices in the numerators are 1.2, 2.3, 3.4, 

(i) We first prove these theorems by Euler’s device of the introduction 
of a second parameter a. 

Let 


K(a) = K(a,x ) = (1 + ax)( 1 + ax 3 )(l + ax 5 ). . . 
= 1 + c\a 4- C 2 U 2 + . . . , 

where c„ = c„(x) is independent of a. Plainly 

K(a) = (1+ ax)K{ax 2 ) 


or 

1 4- cia + C 2 a 2 H (1 4- ax)(l 4- ciax 2 4- c 2 a 2 x* H ). 

Hence, equating coefficients, we obtain 
c\ = x + c ix 2 , c 2 = ci* 3 4- C 2 * 4 , ...,c m = c m -ix 2m ~ 1 4- c m x 2m . 


and so 


x 2m ~ l 

Cm ~ 1 _ x 2m Cm ~ 1 


It follows that 


j^l-f-3-'! h(2/w — 1 ) 

(1 -x 2 )(l -x 4 )...(l -x 2m ) 


(1 -x 2 )(l — X 4 ) . . . (1 -x 2m )* 


(19.5.1) 


(1 +ox)(l 4ax 3 )(l 4- ox 5 ) 

ax 


= 1 + 


flV 


1 -x 2 (1 -x 2 )(l -X 4 ) 


4- — , 


and Theorems 345 and 346 are the special cases a = 1 and a = x. 
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(ii) The theorems can also be proved by arguments independent of 
the theory of infinite series. Such proofs are sometimes described as 
‘combinatorial’. We select Theorem 345. 

We have seen that the left-hand side of the identity enumerates partitions 
into odd and unequal parts: thus 

15= 11-1-3 + 1 = 9 + 5 + 1 = 7 + 5 + 3 

has 4 such partitions. Let us take, for example, the partition 11+3+1, and 
represent it graphically as in B, the points on one bent line corresponding 
to a part of the partition. 



We can also read the graph (considered as an array of points) as in C or 
D, along a series of horizontal or vertical lines. The graphs C and D differ 
only in orientation, and each of them corresponds to another partition of 
15, viz. 6+3+3+1+1+1. A partition like this, symmetrical about the south- 
easterly direction, is called by Macmahon a self-conjugate partition, and the 
graphs establish a (1,1) correspondence between self-conjugate partitions 
and partitions into odd and unequal parts. The left-hand side of the identity 
enumerates odd and unequal partitions, and therefore the identity will be 
proved if we can show that its right-hand side enumerates self-conjugate 
partitions. 

Now our array of points may be read in a fourth way, viz. as in E. 



E 

Here we have a square of 3 2 points, and two ‘tails’, each representing a 
partition of j(15 — 3 2 ) = 3 into 3 parts at most (and in this particular case 
all l’s). Generally, a self-conjugate partition of n can be read as a square of 
m 2 points, and two tails representing partitions of 
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into m parts at most. Given the (self-conjugate) partition, then m and the 
reading of the partition are fixed; conversely, given rt, and given any square 
m 2 not exceeding n, there is a group of self-conjugate partitions of n based 
upon a square of m 2 points. 

Now 


(1 -x 2 )(l -x 4 )...(l -x 2m ) 


is a special case of (19.4.6), and enumerates the number of partitions of 
j(n — m 2 ) into at most m parts, and each of these corresponds as we have 
seen to a self-conjugate partition of n based upon a square of m 2 points. 
Hence, summing with respect to m, 

00 x ™ 2 

1 ^ ( 1 — x 2 )( 1 — X 4 ) . . . ( 1 — x 2m ) 

enumerates all self-conjugate partitions of n, and this proves the theorem. 

Incidentally, we have proved 

Theorem 346. The number of partitions of n into odd and unequal parts 
is equal to the number of its self-conjugate partitions. 

Our argument suffices to prove the more general identity (19.5.1), and 
show its combinatorial meaning. The number of partitions of n into just m 
odd and unequal parts is equal to the number of self-conjugate partitions 
of n based upon a square of m 2 points. The effect of putting a = 1 is to 
obliterate the distinction between different values of m. 

The reader will find it instructive to give a combinatorial proof of 
Theorem 346. It is best to begin by replacing x 2 by x, and to use the 
decomposition l+2 + 3 + -- -4-mof \m(m 4-1). The square of (ii) is 
replaced by an isosceles right-angled triangle. 

19.6. Further algebraical identities. We can use the method (i) of 
§ 19.5 to prove a large number of algebraical identities. Suppose, for 
example, that 


j 

Kj(a) = Kj(a,x ) = ( 14 - ax)( 1 4 - ax 2 ) ... (1 4 - ax j ) = ^ c m a m . 

m= o 
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Then 


(1 +axj +l )Kj(a) = (1 + ax)Kj(ax). 


Inserting the power series, and equating the coefficients of a m , we obtain 

C m + Cm- \X J+l = (Cm + C m - l)x m 


(1 ~ X m ) c m = (X m ~X j+l )c m - 1 =X m (l - X J ~ m+l )c m -l, 


for 1 ^ m ^ j. Hence 
Theorem 348: 

(1 +ax)(l +ax 2 ) ... (1 +ax>\= 1 

+ • • • + a« x i»(»+i) (1 ~ JC,) --- (1 + . . . + a/jlrtZ+l). 

( 1 — jc) . . . ( 1 - x m ) 

If we write x 2 for x, l/x for a, and make j — > oo, we obtain Theorem 
345. Similarly we can prove 

Theorem 349: 

1 1 — x J 

(1 — ax)(l — ax 2 ) . . . (1 — axJ) ”* _aX 1 — x 

+ a y<l 

(1 -x)(l -X 2 ) 

In particular, if we put a = 1 , and make j -*■ oo, we obtain 
Theorem 350: 


1 _ x x 2 

(1-xHl-x 2 )...- (1 -x)(l- jc 2 ) + 



19.7 (351)] 


PARTITIONS 


371 


19.7. Another formula for F(x). As a further example of 
‘combinatorial’ reasoning we prove another theorem of Euler, viz. 

Theorem 35 1 : 

1 x x 4 

(1 - x)(l - x 2 )(l - x 3 ) ... = 1 + (1 - x) 2 + (1 - x) 2 (l - x 2 ) 2 

X 9 

(1 — x) 2 (l — JC 2 ) 2 (1 —x 3 ) 2 ^ 

The graphical representation of any partition, say 


F 

contains a square of nodes in the north-west comer. If we take the largest 
such square, called the ‘Durfee square’ (here a square of 9 nodes), then the 
graph consists of a square containing i 2 nodes and two tails; one of these 
tails represents the partition of a number, say /, into not more than i parts, 
the other the partition of a number, say m, into parts not exceeding i; and 

n = i 2 + l + m. 

In the figure n = 20, i = 3, / = 6, m = 5. 

The number of partitions of l (into at most i parts) is, after § 19.3, the 
coefficient of x l in 

1 

(1 -x)(l -x 2 )...(l -x*y 

and the number of partitions of m (into parts not exceeding /') is the 

.2 

coefficient of x m in the same expansion. Hence the coefficient of x n ~ l in 
{(1 -x)(l -x 2 )...(l 
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or of x" in 

x' 2 

(1 -x) 2 (l -x 2 ) 2 ...(l -X') 2 ’ 

is the number of possible pairs of tails in a partition of n in which the Durfee 
square is i 2 . And hence the total number of partitions of n is the coefficient 
of x n in the expansion of 

x x 4 

1 + (1 — x) 2 + (1 — x) 2 (l —X 2 ) 2 + 

x /2 

+ (1 — x) 2 (l -x 2 ) 2 ...(l -X') 2 + "" 

This proves the theorem. 

There are also simple algebraical^ proofs. 

19.8. A theorem of Jacobi. We shall require later certain special cases 
of a famous identity which belongs properly to the theory of elliptic 
functions. 

Theorem 352. If |x| < 1, then 

OO 

(19.8.1) ]“[ .{Cl -x 2n )(i + x 2 " _1 z)(l + x 2n ~ 1 z~ 1 )} 

n= 1 

oo oo 

= i + xyv+,-”) = E x^v 

n= 1 -oo 


for all z except z — 0. 

The two forms of the series are obviously equivalent. 
Let us write 


P(x,z) = Q{x)R{x,z 1 ), 


* We use the word ‘algebraical* in its old-fashioned sense, in which it includes elementary manipu- 
lation of power series or infinite products. Such proofs involve (though sometimes only superficially) 
the use of limiting processes, and are, in the strict sense of the word, ‘analytical*; but the word ‘analyt- 
ical’ is usually reserved, in the theory of numbers, for proofs which depend upon analysis of a deeper 
kind (usually upon the theory of functions of a complex variable). 
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00 00 

q ( x ) = n o - jc2n )> = n (i + x2n ~ iz> >- 

n= 1 «=1 

When |x| < 1 and z / 0, the infinite products 

y \ (i + \x\ in ), no+ i x2w ” ir i)» n (i + i* 2 " -1 * -1 1) 

n=l n= 1 «=1 

are all convergent. Hence the products Q(x), R(x,z), R(x,z~ l ) and the 
product P(x, z) may be formally multiplied out and the resulting terms col- 
lected and arranged in any way we please; the resulting series is absolutely 
convergent and its sum is equal to P(x,z). In particular, 

oo 

P(x,z) = a„(x)z n , 

n=—oo 

where a„(x) does not depend on z and 
(19.8.2) a-„(x) = a„(x). 

Provided x ^ 0, we can easily verify that 

(1 +xz)R(x, zx 2 ) = R(x,z), R(x,z~ l x~ 2 ) = (1 + z~ l x~ 1 )R(x,z~ 1 ), 

so that xzP{x, zx 2 ) = P(x,z). Hence 

f) x 2 " + 'a„(x)z" +1 = a„(x)z n . 

n=—oo n=—oo 

Since this is true for all values of z (except z = 0) we can equate the 
coefficients of z" and find that a n + 1 (x) = x 2n+l a n (x). Thus, for n > 0, we 
have 

a„ + i(x) =x (2w+1)+(2,,_1)+ " +1 ao(x) = x ( " +1)2 a 0 (x). 

2 

By (19.8.2) the same is true when n+1 < 0 and so a„(x) = x” ao(x) for all 
n, provided x ^ 0. But, when x = 0, the result is trivial. Hence 


(19.8.3) 


P(x,z) = a 0 (x)S(x,z), 
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where 


oo 

S(x,z) = Y, 

n=—oo 

To complete the proof of the theorem, we have to show that ao(*) = 1* 

If z has any fixed value other than zero and if |jc| < 5 (say), the products 
Q(x), R(x,z), R(x,z~ l ) and the series S (x, z) are all uniformly convergent 
with respect to x. Hence P(x, z) and S (x, z ) represent continuous functions 
of x and, as x — ► 0, 

P(x,z ) -> P(0,z) = 1, S(x,z ) -*• S(0,z ) = 1. 

It follows from (19.8.3) that ao(*) -»• 1 as x -> 0. 

Putting z = /, we have 

OO 

(19.8.4) S(x,i) = 1 + 2 £ ( -l) n x 4n 2 = S(x 4 , -1). 

n= 1 

Again 


00 


00 


R(x,i)R(x,r l ) = n {0 + « 2 ” _1 )( 1 - ix 2n ~ x )) = f[ (1 +X 4 ”- 2 ), 


n= 1 


n=\ 


00 00 

q ( x ) = n (i - j' 2 ") = n i (i - . 

h= 1 n=l 

and so 

00 

(19.8.5) P(x,i) = ]"[ {(1 -x 4n )(l -X 8 "" 4 )} 

/l=l 

= n i (i -x 8 «- 4 ) 2 j=p ( x 4 ,-i). 

Clearly .PC* 4 , — 1) 7^ 0, and so it follows from (19.8.3), (19.8.4), and 

(19.8.5) that ao(x) = ao(x 4 ). Using this repeatedly with x 4 ,x 42 ,x 43 , . . . 
replacing x, we have 

ao (x) = ao(x 4 ) = . . . = ao(x 4 *) 
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for any positive integer k. But |x| < 1 and so x 4 * — ► 0 as k — ► oo. Hence 

a 0 (x) = lim ao(x ) = 1. 

*-►0 

This completes the proof of Theorem 352. 

19.9. Special cases of Jacobi’s identity. If we write x* for x, —x 1 and 
x l for 2 , and replace n by n+l on the left-hand side of (19.8.1), we obtain 

OO 

(19.9.1) Yl (0 — x 2kn+k ~ l )(l -x 2kn+k+l )(l -x Lkn ^ 2k )} 

n = 0 

oo 

= J2 (~i) n x kn2+in , 

n=—oo 

oo 

(19.9.2) J~[ {(1 +x 2kn+k ~ l )(l -X 2 kn + k + l )(\ - x 2kn+2k )} 

n=0 

oo 

= £ >■+'". 
n = — oo 

Some special cases are particularly interesting. 

(i) k = 1, / = 0 gives 

oo oo 

f[|(l-x 2 " +1 ) 2 (l-i 2 " +2 ))= £ (-1 A” 2 , 

n=0 n=t—oc 

OO oo 

Y[ {(1 +X 2w+1 ) 2 (l -x 2n+2 )\ = £ x"\ 

n=0 n=— oo 

two standard formulae from the theory of elliptic functions. 

(ii) £ = \,l = j in (19.9.1) gives 

oo OO 

]”I {(1 -X 3w+1 )(l -X 3n+2 )(l -x 3w+3 )} = Y2 (-l)' , x5 w(3n+1) 

/i=0 n=—oo 
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or 

Theorem 353: 

oo 

(1 -x)(l -x 2 )(l -X 3 )... = 52 (-l)"xi" (3,ri ' ,) . 

n=— oo 


This famous identity of Euler may also be written in the form 
(19.9.3) (1 — jc)( 1 — x 2 )(l - x 3 ) . . . 


OO 

_ j _|_ y* (— i)” | JC 2«( 3 «-i) + x i n ® n+l ^ j 


«=i 


= 1— x — x 2 +x 5 +x 7 — x 12 - x 15 + 

(iii) k = / = j in (19.9.2) gives 

oo oo 

Y\ {(1 + *")(! -x 2n+2 )} = 5 ^ X? n(n+l) , 


n=0 n=—oo 

which may be transformed, by use of (19.4.7), into 
Theorem 354: 

(1 - x 2 )(l -x 4 )(l -x 6 )... 


(1 — x)(l — x 3 )(l — X 5 )... 


= 1 +X + X-* +X° +x 10 H 


Here the indices on the right are the triangular numbers.^ 
(iv) k = §,/ = \ and k = §,/ = £ in (19.9.1) give 


Theorem 355: 

oo 


OO oo 

J~[ {(1 -x 5,,+, )(l -x 5w+4 )(l -x 5 "+ 5 )} = 52 (-1 )"jcI" ( 5b+3) . 


n = 0 

Theorem 356: 

oo 


n ~ — oo 


oo 


J~[ ((1 -x 5n+2 )(l -x 5 " +3 )(l -x 5w+5 )j = 52 (- 1 )"x 3 " (5w+,) . 


n = 0 


n=—oc 


* The numbers + 1). 
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We shall require these formulae later. 

As a final application, we replace x by *2 and z by *2£ in (19.8.1). This 
gives 


00 

Y\ {(i +*"00 +x"- 1 ?- 1 )| 

n= 1 


oo 

x \n(n+\)^n 

n=—oo 


or 


oo 


(i + r 1 ) f[ Id +*"~ 1 r 1 )| 

n= 1 
oo 


/i=0 


where on the right-hand side we have combined the terms which correspond 
to n = m and n = —m— 1 . We deduce that 


OO 


(19.9.4) n«> -*")(! +x"£)(l+x ,, £- 1 )} 


n=l 


^ _/i+£ 2m+1 \ 1 

S’ (-Arr 


1) 


m — 0 
oo 


= ^ x im(*+ ! ) f -*(i +c 2 + c 2m } 


m=0 

for all £ except £ = 0 and £ = — 1 . We now suppose the value of x fixed 
and that £ lies in the closed interval — | ^ £ ^ — j. The infinite product 
on the left and the infinite series on the right of (19.9.4) are then uniformly 
convergent with respect to £ . Hence each represents a continuous function 
of £ in this interval and we may let £ — ► — 1 . 

We have then 

Theorem 357: 

OO OO 

Y\ (1 -*") 3 = (-l) m (2"» + l)jc3 m(OT+l) . 


n=l 


m— 0 


This is another famous theorem of Jacobi. 
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19.10. Applications of Theorem 353. Euler’s identity (19.9.3) has a 
striking combinatorial interpretation. The coefficient of x" in 

(1 -x)(l -x 2 )(l -x 3 )... 

is 

(19.10.1) £(-D v , 

where the summation is extended over all partitions of n into unequal parts, 
and v is the number of parts in such a partition. Thus the partition 3+2+1 of 
6 contributes (— l) 3 to the coefficient ofx 6 . But (19.10.1) isEin) — U(n), 
where E(n) is the number of partitions of n into an even number of unequal 
parts, and U(n) that into an odd number. Hence Theorem 353 may be 
restated as 

Theorem 358. E(n) = U ( n ) except when n = jk(3k ±1), when 

E(n)-U(n) = (- 1)*. 

Thus 

7 = 6+1 = 5 + 2 = 4 + 3 = 4 + 24-1, 

E( 7) = 3, U( 7) = 2, E( 7) - U(l) = 1, 

and 

7= 2. (3. 2+1), k = 2. 

The identity may be used effectively for the calculation of pin). For 

f 00 

(1 — x — x 2 + x 5 + x 7 — . . .) 1 + y ^pjh)x n 

l 

1 - X - X 2 + X 5 + X 7 - . . . 

“ (1 — x)(l -x 2 )(l -X 3 )... “ ' 

Hence, equating coefficients, 

(19.10.2) 

Pin) -pin - 1) -pin - 2) +pin - 5) + . . . 

+ (-!)*/>{« - \k&k - 1)} + (-1 ) k p[n - \kOk + 1)} H 0. 
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The number of terms on the left is about 2 V /(|«) for large n. 

Macmahon used (19.1 0.2) to calculate pin) up to n = 200, and found that 

/7(200) = 3972999029388. 

19.11. Elementary proof of Theorem 358. There is a very beauti- 
ful proof of Theorem 358, due to Franklin, which uses no algebraical 
machinery. 

We try to establish a ( 1 , 1 ) correspondence between partitions of the two 
sorts considered in § 19.10. Such a correspondence naturally cannot be 
exact, since an exact correspondence would prove that E{n) = U{n) for 
all n. 

We take a graph G representing a partition of n into any number of 
unequal parts, in descending order. We call the lowest line AB 



G H 


(which may contain one point only) the ‘base’ fi of the graph. From C, the 
extreme north-east node, we draw the longest south-westerly line possible 
in the graph; this also may contain one node only. This line CDE we call 
the ‘slope’ a of the graph. We write fi < a when, as in graph G, there are 
more nodes in cr than in fi, and use a similar notation in other cases. Then 
there are three possibilities. 

(a) fi < a. We move fi into a position parallel to and outside a, as shown 
in graph H. This gives a new partition into decreasing unequal parts, and 
into a number of such parts whose parity is opposite to that of the number 
in G. We call this operation O, and the converse operation (removing cr 
and placing it below fi) fi. It is plain that £2 is not possible, when fi < cr, 
without violating the conditions of the graph. 

(b) fi = o. In this case O is possible (as in graph I) unless fi meets <r (as 
in graph J), when it is impossible. £2 is not possible in either case. 

(c) fi > a. In this case O is always impossible. £2 is possible (as in 
graph K) unless fi meets a and fi = cr+1 (as in graph L). fi is impossi- 
ble in the last case because it would lead to a partition with two equal 
parts. 
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K L 


To sum up, there is a (1, 1) correspondence between the two types of 
partitions except in the cases exemplified by J and L. In the first of these 
exceptional cases n is of the form 

k + (k + 1 ) + • • • + (2k — 1 ) = \ (3k 2 - *), 

and in this case there is an excess of one partition into an even number 
of parts, or one into an odd number, according as & is even or odd. In the 
second case n is of the form 

(k + 1 ) + (k + 2) + • • • + 2k = \ (3k 2 + k ), 

and the excess is the same. Hence E(ri) — U (n) is 0 unless n = j (3k 2 ± k), 
when E(n) — U(ri) = (— 1)*. This is Euler’s theorem. 

19.12. Congruence properties of p(n ). In spite of the simplicity of the 
definition of p(n), not very much is known about its arithmetic properties. 

The simplest arithmetic properties known were found by Ramanujan. 
Examining Macmahon’s table of p(n), he was led first to conjecture, 
and then to prove, three striking arithmetic properties associated with the 
moduli 5, 7, and 11. No analogous results are known to modulus 2 or 3, 
although Newman has found some further results to modulus 13. 

Theorem 359: 


p(5m -f 4) = 0 (mod 5). 


Theorem 360: 


p(lm + 5) = 0 (mod 7). 


Theorem 361*: 


p(\\m + 6) = 0 (mod 11). 
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We give here a proof of Theorem 359. Theorem 360 may be proved in 
the same kind of way, but Theorem 361 is more difficult. 

By Theorems 353 and 357, 

jc{( 1 -x)(l -x 2 )...) 4 =x(l -x)(\ -* 2 )...{(1 -x)(l -x 2 )...} 3 

= x(l — x — x 2 + x 5 + . . .) 
x (1 - 3x + 5x 3 - 7x 6 + . . .) 

OO OO 

= £ £(-d' +s (2*+d**. 

r——oo 5=0 

where 

k = k{r,s) = 1 + \r(2>r + 1) + \s(s + 1). 

We consider in what circumstances k is divisible by 5. 

Now 

2 (r + l) 2 + (25 + l) 2 = 8 k- 10r 2 ~5 = U (mod 5). 

Hence k = 0 (mod 5) implies 

2(r + l) 2 + (25 + l) 2 = 0 (mod 5). 

Also 

2(r + l) 2 s 0, 2, or 3, (25 + l) 2 s 0, 1, or 4 (mod 5), 

and we get 0 on addition only if 2(r+l) 2 and (25+ 1) 2 are each divisible by 
5. Hence k can be divisible by 5 only if 2s + 1 is divisible by 5, and thus the 
coefficient of * 5m + 5 in 

x {( l - x )(\- x 2 )...} 4 

is divisible by 5. 

Next, in the binomial expansion of ( 1 — x) “ 5 , all the coefficients are divi- 
sible by 5, except those of 1, x 5 , jc 10 ,. . ., which have the remainder 1 f We 
may express this by writing 

(rb 53 r? (mod5); 


t Theorem 76 of Ch. VI. 
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the notation, which is an extension of that used for polynomials in § 7.2, 
implying that the coefficients of every power of* are congruent. It follows 
that 

(T3^5 sl(mod5) 


and 


(1 — x 5 ) ( 1 -x 10 )(l -x 15 )... 
{(l-x)(l-X 2 )(l-x3)...} 5 


= 1 (mod5). 


Hence the coefficient of x 5m+5 in 

(1 -x 5 )(l -x 10 ) 

(1 -x)(l -x 2 ). 


- = x{(l -x)(l -x 2 )...) 


(1 -x 5 )(l -x 10 )... 
{(1 -x)(l -x 2 )...) 5 


is a multiple of 5. Finally, since 

x _ (1 — x 5 )(l — x 10 ) . . . 

(1 -x)(l -x 2 )... = * (1 — x)(l -x 2 )... 

X (1 + x 5 +x 10 + . . .)(1 +x 10 + x 20 + ...)..., 


the coefficient of x 5m+5 in 

00 

is a multiple of 5; and this is Theorem 359. 

The proof of Theorem 360 is similar. We use the square of Jacobi’s series 
1 — 3x 4- 5x 3 — 7x 6 4- . . . instead of the product of Euler’s and Jacobi’s 
series. 

There are also congruences to moduli 5 2 , 7 2 , and 1 1 2 , such as 
p(25m + 24) = 0 (mod 5 2 ). 

Ramanujan made the general conjecture that if 

& = 5 a l b \\ c . 


and 


24 n = 1 (mod 8), 
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p{n) = 0 (mod 5). 

It is only necessary to consider the cases S = 5 a , 7 b , ll c , since all others 
would follows as corollaries. 

Ramanujan proved the congruences for 5 2 , 7 2 , 1 1 2 , Kredmar that for 5 3 , 
and Watson that for general 5 fl . But Gupta, in extending Macmahon’s table 
up to 300, found that 


p( 243) = 133978259344888 

is not divisible by 7 3 = 343; and, since 24 . 243 = 1 (mod 343), this 
contradicts the conjecture for 7 3 . The conjecture for l b had therefore to be 
modified, and Watson found and proved the appropriate modification, viz. 
that p{n) = 0 (mod 7 b ) if b > 1 and 24« = 1 (mod 7 2b ~ 2 ). 

D. H. Lehmer used a quite different method based upon the analytic 
theory of Hardy and Ramanujan and of Rademacher to calculate p(n) for 
particular n. By this means he verified the truth of the conjecture for the 
first values of n associated with the moduli ll 3 and ll 4 . Subsequently 
Lehner proved the conjecture for 1 1 3 and Atkin for general 1 l c . 

Dyson conjectured and Atkin and Swinnerton-Dyer proved certain 
remarkable results from which Theorems 359 and 360, but not 361, are 
immediate corollaries. Thus, let us define the rank of a partition as the 
largest part minus the number of parts, so that, for example, the rank of 
a partition and that of the conjugate partition differ only in sign. Next we 
arrange the partitions of a number in five classes, each class containing 
the partitions whose rank has the same residue (mod 5). Then, if n = 4 
(mod 5), the number of partitions in each of the five classes is the same and 
Theorem 359 is an immediate corollary. There is a similar result leading to 
Theorem 360. 

19.13. The Roger s-Ramanuj an identities. We end this chapter with 
two theorems which resemble Theorems 345 and 346 superficially, but are 
much more difficult to prove. These are 

Theorem 362: 

xx 4 x 9 

1 —x (1 — jc)(1 — x 2 ) (1 — ;t)(l — x 2 )(l —x 3 ) 

_ 1 

(1 — x)(l —x 6 ) ... (1 — x 4 )(l —x 9 ) . . .’ 
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i.e. 


(19.13.1) 


oo 


'+E7TT 




1 


OO 


-Uttz 


(i -x 5m + i )(i -x Sm + 4 y 


Theorem 363: 


1 + 


,12 


+ 


+ 


\-x (1 — JC)(1 — JC 2 ) (1 — JC 2 )(i — JC 3 ) 

1 

” (1 -x 2 )(l -* 7 )...(1 -x 3 )(l -x 8 )...’ 


+ . . . 


i.e. 


(19.13.2) 


1 + E 


jinim+l) 

(1 -*)(1 -* 2 )...(1 - x m ) 



1 

(1 -x 5,w + 2 )( 1 -x 5 *+ 3 )* 


The series here differ from those in Theorems 345 and 346 only in that x 2 
is replaced by x in the denominators. The peculiar interest of the formulae 
lies in the unexpected part played by the number 5. 

We observe first that the theorems have, like Theorems 345 and 346, a 
combinatorial interpretation. Consider Theorem 362, for example. We can 
exhibit any square m 2 as 

m 2 = 1+3 + 5 + 1- (2m — 1) 

or as shown by the black dots in the graph M, in which m = 4. If we now take 
any partition of n — m 2 into m parts at most, with the parts in descending 
order, and add it to the graph, as shown by the circles of M, where m = 4 
and n = 4 2 +l 1 = 27, we obtain a partition of n (here 27 = 1 1+8+6+2) into 
parts without repetitions or sequences, or parts whose minimal difference 
is 2. The left-hand side of (19.13.1) enumerates this type of partition of n. 
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• ••••poo 

• • • o o o 

• o 

M 

On the other hand, the right-hand side enumerates partitions into num- 
bers of the forms 5m + 1 and 5m + 4. Hence Theorem 362 may be restated 
as a purely ‘combinatorial’ theorem, viz. 

Theorem 364. The number of partitions of n with minimal difference 2 
is equal to the number of partitions into parts of the forms 5m -I- 1 and 
5m + 4. 

Thus, when n = 9, there are 5 partitions of each type, 

9, 8+1, 7 + 2, 6 + 3, 5 + 3 + 1 

of the first kind, and 

9, 6+1 + 1 + 1, 4 + 4+1, 4+1 + 1 + 1 + 1 + 1, 

l+l+l+l+l+l+l+l+l 


of the second. 

Similarly, the combinatorial equivalent of Theorem 363 is 

Theorem 365. The number of partitions of n into parts not less than 2, 
and with minimal difference 2, is equal to the number of partitions of n into 
parts of the forms 5m + 2 and 5m + 3. 

We can prove this equivalence in the same way, starting from the identity 
m(m + l) = 2 + 4 + 6 + -- -+ 2m. 

The proof which we give of these theorems in the next section was found 
independently by Rogers and Ramanujan. We state it in the form given by 
Rogers. It is fairly straightforward, but unilluminating, since it depends 
on writing down an auxiliary function whose genesis remains obscure. It 
is natural to ask for an elementary proof on some such lines as those of 
§ 19.11, and such a proof was found by Schur; but Schur’s proof is too 
elaborate for insertion here. There are other proofs by Rogers and Schur, 
and one by Watson based on different ideas. No proof is really easy (and it 
would perhaps be unreasonable to expect an easy proof). 
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19.14, Proof of Theorems 362 and 363. We write 


Po 


= '-P' = fl rr?’ a =&<«)= n rr 

5=1 s=r 


ax s ’ 


X(r) = 5r(5r + 1), 
and define the operator rj by 

rifta) =f(ax). 

We introduce the auxiliary function 

OO 

(19.14.1) H m = H m (a) = ( — 1 ) r a 2r x ^~ mr ( 1 - a m x 2mr )P r Q r , 

r = 0 

where m = 0, 1, or 2. Our object is to expand H\ and Hi in powers of a. 
We prove first that 

(19.14.2) = a m ~ l (m = 1,2). 

We have 

OO 

H m - H m - 1 = £ (-1 ) r a 2r ^C mr P r Q r , 

r=0 

where 

r* v -wr „mmr ( \-m)r , 1 r(m— 1) 

'-'/nr — ua a ~r u x 

= a m- 1 Jc r ( m- 1 ) ( i _ ax r )+JC -mr ( i _ y } 

Now 

(1 - ax r )Qr = Q r+ U (1 - X r )Pr = Pr-U 1 - X° = 0, 

and so 

OO 

= ^(-l)V''+"- | x l<r > +rt "- 1 >F r Q r+1 

r=0 

oo 

r=l 



19.14] 


PARTITIONS 


387 


In the second sum on the right-hand side of this identity we change r into 
r + 1 . Thus 

oo 

H m - H m - 1 = (— 1 ) r D mr P r Q r + 1 , 

r= 0 

where 

D mr = a 2r + m-1 j C *(r)+K»»-l) _ ^(r+X) x \{r+\)-m(r+\) 

_ a m-l+2r x X.(r)+r(m-l)(Y _ a 3-m x (2r+\)Q-m 

= a w ~S {a 2r x X(r) - r(3 " m) (l - a 3 - m x 2r(3 - m) )} , 
since X(r + 1) — X(r) = 5r + 3. Also Q r +\ = rjQ r and so 
Hm H m — i 

OO 

= a ff, -S5I!(-l) r a 2r A: X(r) - r(3 - m) (l - a 2 - m x 2r ^~ m) )P r Q r 

r=0 

= a 

which is (19.14.2). 

If we put m- 1 and m = 2 in (19.14.2) and remember that //o = 0, 
we have 


(19.14.3) 

so that 


Hi = r}H 2 , 
H 2 —H\ = ai)H\, 


(19.14.4) H 2 = r)H 2 + arj 2 H 2 . 

We use this to expand H 2 in powers of a. If 

H 2 = co + c\a + ■ • • = ^ c s cf, 
where the c s are independent of a, then co = 1 and (19.14.4) gives 

53 Cs ‘ ** = 53 c s xS ° s + 53 c s x2SaS+l > 
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Hence, equating the coefficients of 0 s , we have 


c 1 


1 

1 -x’ 


c s — 


., 2 s — 2 


1 -X 5 


C s -\ 


x 2+4+-+2(i— 1) 

(1 -*)...(! -X s ) 


X s(s ~ l) P s . 


Hence 


00 

H 2 (a) = 

s=0 


If we put a = x, the right-hand side of this is the series in (19.13.1). Also 
PrQrix) = Poo and so, by (19.14.1), 


OO 

H 2 (x ) = Poo (-l) r ^ (r) (l - * 2(2r+1) ) 

r = 0 

I OO 00 

J2 (-l) r ^ (r) + jn (-1 )^ ( r- 1) + 2( 2r- 1) 

r= 0 r=\ 

I oo \ 

1 + (M) r (x2 r(5r+1) +x2 r(5r “ 1) ) 1. 

r=l * 


Hence, by Theorem 356, 


H 2 (x) = Poo PI (d -x 5n+2 )(l -x 5n+2 )(\ -* 5n+5 )} 

n=0 


oo 


=n 


i 

(1 -*5n+l)(l -X 5 "* 4 )’ 


This completes the proof of Theorem 362. 
Again, by (19.14.3), 


OO 

Hi (a) = r)H 2 (a) = H 2 (ax ) = £aV 2 P s 

5=0 

and, for a = x, the right-hand side becomes the series in (19.13.2). Using 
(19.14.1) and Theorem 355, we complete the proof of Theorem 363 in the 
same way as we did that of Theorem 362. 
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19.15. Ramanujan’s continued fraction. We can write (19.14.14) in 
the form 


so that 


H 2 (a,x) = Hi(ax, x) + aH 2 (ax i ,x) 


Hiiax; x ) = H 2 (ax 2 ,x) + ax// 2 (ax\x). 


Hence, if we define F (a) by 


F(a ) = F(a,x) = H\(a,x) = 77^/2 (a, x) = H 2 (ax,x) 
, ax a 2 x 4 

= 1 + r^ + a-x)(i-x2) + ' "• 


then F(a) satisfies 


F(ax”) = F(ax n+1 ) + ax w+ 1 F(ax ,,+2 ). 


Hence, if 


= 


F(ax n ) 

F(ax n+i y 


we have 


**« = 1 + 


ax " +1 


and hence uo = F(a)/F(ax ) may be developed formally as 


(19.15.1) 


F(a) , , ax ax 2 ax 3 
F{ax) = + T+T+ ! + 


a ’continued fraction’ of a different type from those which we considered 
in Ch. X. 

We have no space to construct a theory of such fractions here. It is not 
difficult to show that, when |x| < 1 , 


1 ( ax ax * 

+ I+i + - 
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tends to a limit by means of which we can define the right-hand side of 
(19.15.1). If we take this for granted, we have, in particular, 

F(l) x x 2 x 3 

F00~ + 1 + 1+1 + ---’ 

and so 

x x 2 _ 1 - x 2 — x 3 + x 9 + . . . 

+ 1+ 1H ” 1 -x-x 4 ** 7 *--- 

(1 -x 2 )^ -x 7 )...(l -x 3 )(l -JC 8 )... 

(1 -x)(l -x 6 )...(l -x 4 )(l -x 9 )--- ' 


It is known from the theory of elliptic functions that these products and 
series can be calculated for certain special values of x, and in particular 
when x = e~ 2n ^ h and h is rational. In this way Ramanujan proved that, 
for example. 


1 + 


e~ 2n e~ 4n e~ 6n 

1 + 1 + 1 +... 



V5 + n 


2 


7T. 


NOTES 

§19.1. There are general accounts of the earlier theory of partitions in Bachmann, Niedere 
Zahlentheorie, ii, ch. 3; Netto, Combinatorik (second ed. by Brun and Skolem, 1927); and 
MacMahon, Combinatory analysis , ii. For references to later work, see the survey by 
Gupta (J. Res. Nat. Bur. Standards B74 (1970), 1-29); Andrews, Partitions', Andrews 
and Eriksson, Integer Partitions', Ono and Ahlgren ( Notices Amer. Math. Soc., 48 (2001), 
978—84); Ono, The Web of Modularity. 

§§ 1 9.3-5. All of the formulas of these sections are Euler’s. More extensive developments 
of these methods can be found in Andrews, Partitions, ch. 2 and Andrews and Eriksson, 
Integer Partitions, ch. 5. For historical references, see Dickson, History, ii, ch.3. 

§19.6. Theorem 348 (the 9 -binomial theorem) and Theorem 349 (the 9 -binomial series) 
are not in Euler’s works. Cauchy studied them, but probably they predate him. Further appl- 
ications of these results appear in Andrews, Partitions, ch. 3, and Andrews and Eriksson, 
ch. 7. 

§19.7. While this formula is often attributed to Euler, its first published appearance is 
by Jacobi, Fundamenta nova, §64. Indeed, Jacobi needed a generalization of Theorem 351 
for his original proof of Theorem 352. 

§19.8. Theorem 352 is often referred to as Jacobi’s triple product identity, (Jacobi, 
Fundamenta nova, §64). The theorem was known to Gauss. The proof given here is ascribed 
to Jacobi by Enneper; Mr. R. F. Whitehead drew our attention to it. Wright (J. London Math. 
Soc. 40 (1965), 55-57) gives a simple combinatorial proof of Theorem 352, using arrays 
of points as in §§19.5, 19.6, and 19.11. A full history of the method used by Wright and 
an extensive application of it are given by Andrews (Memoirs of the Amer. Math. Soc. 
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49 (1984)). Alternative proofs appear in Andrews, Partitions , ch. 2, and in Andrews and 
Eriksson, Integer partitions, ch. 8. 

§ 1 9.9. Theorem 353 is due to Euler; for references see Bachmann, Niedere Zahlentheorie 
ii, 163, or Dickson, History , ii. 103. Theorem 354 was proved by Gauss in 1808 ( Werke , 
ii. 20), and Theorem 357 by Jacobi (Fundamenta nova, §66). Professor D. H. Lehmer 
suggested the proof of Theorem 357 given here. 

§19.10. MacMahon’s table is printed in (Proc. London Math Soc. (2) 17 (1918), 1 14 — 
15), and has subsequently been extended to 600 (Gupta, ibid. 39 (1935), 142-9, and 
42 (1937), 546-9), and to 1000 (Gupta, Gwyther, and Miller, Roy. Soc. Math. Tables 4 
(Cambridge, 1958)). Recently Sun Tae Soh has prepared a program for computing p(n) for 
n < 22,000,000 (cf. http://trinitas.mju.ac.kr/intro2numbpart.html). 

§19.11 F. Franklin, ( Comptes rendus , 92 (1881), 448-50). We observe that, if we 
use this method to prove Theorem 358, i.e. Theorem 353, we can shorten the proof of 
Theorem 352 in §19.8. We proceed as before up to (19.8.3). We then put x = y*/ 2 z = — y 1 / 2 
and have 


oo oo 

pm = n K 1 -^ 3 ") o V" -1 ) o ->' 3 ” -2 )) = n o - y m ) 

n—\ m=l 


and 


oo 

S(x,z)= (-1)”.k2' ,(3 " +1) =P(x,z) 

n=— oo 


by Theorem 353, so that ao(x) = 1. 

§19.12. See Ramanujan, Collected Papers, nos. 25, 28, 30. These papers contain com- 
plete proofs of the congruences to moduli 5, 7, and 1 1 only. On p. 213 he states identities 
which involve the congruences to moduli 5 2 and 7 2 as corollaries, and these identities were 
proved later by Darling (Proc. London Math. Soc. (2) 1 9 ( 1 92 1 ), 350-72) and Mordell (ibid. 
20 (1922), 408-16). An unpublished manuscript of Ramanujan dealt with many instances 
of his conjecture; this document has been retrieved by Bemdt and Ono (The Andrews 
Festschrift, Springer, 2001, pp. 39-1 10). 

The papers referred to at the end of the section are Gupta’s mentioned in the Note to 
§19.10; Kreimar (Bulletin de I'acad. des sciences de I’URSS (7) 6 (1933), 763-800); 
Lehmer (Journal London Math. Soc. 11 (1936), 114-18 and Bull. Amer. Math. Soc. 44 
(1938), 84-90); Watson (Journal fur Math. 179 (1938), 97-128); Lehner (Proc. Amer. 
Math. Soc. 1 (1950), 172-81); Dyson (Eureka 8 (1994) 10-15); Atkin and Swinnerton- 
Dyer (Proc. London Math. Soc. (3) 4 (1954), 84-106). Atkin (Glasgow Math. J. 8 (1967), 
14-32) proved the 1 l c result for general c and has also found a number of other congruences 
of a more complicated character. 

More recently Ono, The Web of Modularity, and his colleagues have vastly expanded 
our knowledge of partition function congruences. Andrews and Garvan (Bull. Amer. Math. 
Soc. 18 (1998), 167-71) found the ‘crank’ conjectured by Dyson; Mahlburg (Proc. Nat. 
Acad. Sci. 102 (2005), 15373—76) has related the crank to the cornucopia of congruences 
discovered by Ono. 

§§ 19. 13-14. For the history of the Rogers-Ramanujan identities, first found by Rogers 
in 1894, see the note by Hardy reprinted on pp. 344-5 of Ramanujan’s Collected papers, 
and Hardy, Ramanujan, ch. 6. Schur’s proofs appeared in the Berliner Sitzungsberichte 
(1917), 302-21, and Watson’s in the Journal London Math. Soc. 4 (1929), 4—9. Hardy, 
Ramanujan, 95-99 and 107-11, gives other variations of the proofs. * 
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Selberg, Avhandlinger Norske Akad. (1936), no. 8, has generalized the argument of 
Rogers and Ramanujan, and found similar, but less simple, formulae associated with the 
number 7. Dyson, Journal London Math . Soc. 18(1 943), 35-39, has pointed out that these 
also may be found in Rogers’s work, and has simplified the proofs considerably. 

More recently, development of the theory and extension of the Rogers-Ramanujan iden- 
tities has been very active. Accounts of these discoveries can be found in surveys by Alder 
(Amer. Math. Monthly , 76 (1969), 733-46); Alladi {Number Theory ; Paris 1992-93, Cam- 
bridge University Press ( 1 995), 1 -36); Andrews (Advances in Math. ,9(1 972), 1 0-5 1 ; Bull. 
Amer. Math. Soc., 80 (1974), 1033 — 52; Memoirs Amer. Math. Soc., 152 (1974) IH-86 pp.; 
Pac. J. Math. 114 (1984), 267-83). Applications in physics are surveyed by Berkovich and 
McCoy (Proc. ICM 1998, III, 163-72). See also Andrews, Partitions. 

Mr. C. Sudler suggested a substantial improvement in the presentation of the proof in 
§ 19.14. 

§19.15. Recent discoveries concerning the Rogers-Ramanujan continued fraction are 
discussed in Andrews and Bemdt, Ramanujan *s Lost Notebook , Part /, chs. 1-8. 
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THE REPRESENTATION OF A NUMBER 
BY TWO OR FOUR SQUARES 

20.1. Waring’s problem: the numbers g(k) and G(k). Waring’s 
problem is that of the representation of positive integers as sums of a fixed 
number s of non-negative kth powers. It is the particular case of the general 
problem of § 19.1 in which the a are 

0*,1*,2*,3*,... 

and s is fixed. When k = 1 , the problem is that of partitions into s parts of 
unrestricted form; such partitions are enumerated, as we saw in Ch. XIX, 
by the function 

1 

(1 -x)(l -x 2 )... (I -X s )' 

Hence we take k ^ 2. 

It is plainly impossible to represent all integers if s is too small, for 
example if s = 1. Indeed it is impossible if s < k. For the number of 
values of x\ for which xf ^ n does not exceed n x ^ k + 1; and so the number 
of sets of values xi, X 2 , . . . ,x*-i for which 

xf + • • • + xJ_j ^ n 

does not exceed 

( W i/* + i)*-i = „(*-»/* + 0(n {k ~ 2)/k ). 

Hence most numbers are not representable by k — 1 or fewer &th powers. 

The first question that arises is whether, for a given k , there is any fixed 
s = s(k ) such that 

(20.1.1) /i=xf+xf-| +x£ 

is soluble for every n. 

The answer is by no means obvious. For example, if the a of § 19.1 are the numbers 

1.2, 2 2 , . . . ,2 m , . . . , 

then the number 

2 m+1 -1 = 1 + 2 + 2 2 + -- - + 2 m 
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is not representable by less than m + 1 numbers a, and we have m + 1 — * oo when 
n = 2 m+1 - 1 -*■ oo. Hence it is not true that all numbers are representable by a fixed 
number of powers of 2. 

Waring stated without proof that every number is the sum of 4 squares, 
of 9 cubes, of 19 biquadrates, ‘and so on’. His language implies that he 
believed that the answer to our question is affirmative, that (20.1.1) is 
soluble for each fixed k, any positive n, and an s = s(k) depending only 
on k. It is very improbable that Waring had any sufficient grounds for his 
assertion, and it was not until more than 100 years later that Hilbert first 
proved it true. 

A number representable by s fcth powers is plainly representable by any 
larger number. Hence, if all numbers are representable by s Ath powers, 
there is a least value of s for which this is true. This least value of j is 
denoted by g(k) . We shall prove in this chapter that g(2) = 4, that is to say 
that any number is representable by four squares and that four is the least 
number of squares by which all numbers are representable. In Ch. XXI we 
shall prove thatg(3) and g(4) exist, but without determining their values. 

There is another number in some ways still more interesting than g(k). 
Let us suppose, to fix our ideas, that k = 3. It is known that g( 3) = 9; 
every number is representable by 9 or fewer cubes, and every number, 
except 23 = 2 . 2 3 + 7 . 1 3 and 

239 = 2 . 4 3 + 4 . 3 3 + 3 . I 3 , 

can be represented by 8 or fewer cubes. In fact, all sufficiently large num- 
bers are representable by 7 or fewer. Numerical evidence indicates that 
only 15 other numbers, of which the largest is 454, require so many cubes 
as 8, and that 7 suffice from 455 onwards. 

It is plain, if this be so, that 9 is not the number which is really most signi- 
ficant in the problem. The facts that just two numbers require 9 cubes, and, 
if it is a fact, that just 15 more require 8, are, so to say, arithmetical flukes, 
depending on comparatively trivial idiosyncrasies of special numbers. 
The most fundamental and most difficult problem is that of deciding, not 
how many cubes are required for the representation of all numbers, but 
how many are required for the representation of all large numbers, i.e. of 
all numbers with some finite number of exceptions. 

We define G(k ) as the least value of s for which it is true that all suf- 
ficiently large numbers, i.e. all numbers with at most a finite number of 
exceptions, are representable by s &th powers. Thus G (3) ^ 7. On the other 
hand, as we shall see in the next chapter, G (3) ^ 4; there are infinitely 
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many numbers not representable by three cubes. Thus G( 3) is 4, 5, 6, or 7; 
it is still not known which. 

It is plain that 

G(k)^g(k) 

for every k. In general, G(k) is much smaller than g(k), the value of g(k) 
being swollen by the difficulty of representing certain comparatively small 
numbers. 

20.2. Squares. In this chapter we confine ourselves to the case k — 2. 
Our main theorem is Theorem 369, which, combined with the trivial result* 
that no number of the form 8m + 7 can be the sum of three squares, shows 
that 

g{ 2) = G( 2) = 4. 

We give three proofs of this fundamental theorem. The first (§ 20.5) is 
elementary and depends on the ‘method of descent’, due in principle to 
Fermat. The second (§§ 20.6-9) depends on the arithmetic of quaternions. 
The third (§ 20.11-12) depends on an identity which belongs properly to 
the theory of elliptic functions (though we prove it by elementary algebra),* 
and gives a formula for the number of representations. 

But before we do this, we return for a time to the problem of the 
representation of a number by two squares. 

Theorem 366. A number n is the sum of two squares if and only if all 
prime factors ofnof the form 4m + 3 have even exponents in the standard 
form of n. 

This theorem is an immediate consequence of ( 1 6.9.5) and Theorem 278. 
There are, however, other proofs of Theorem 366, some independent of 
the arithmetic of k(i), which involve interesting and important ideas. 

20.3. Second proof of Theorem 366. We have to prove that n is of the 
form of x 2 + y 2 if and only if 

(20.3.1) n = n 2 n 2 , 

where «2 has no prime factors of the form 4m + 3. 

We say that 

n = x 2 + y 2 

is a primitive representation of n if (x,y) = 1, and otherwise an imprimitive 
representation. 


t See §20.10. 


* See the footnote to p. 372. 
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Theorem 367. If p = 4m -I- 3 and p\n, then n has no primitive represen- 
tations. 

If n has a primitive representation, then 

/>l(* 2 +y 2 ). (x,jO = 1, 

and so p \ x,p \ y. Hence, by Theorem 57, there is a number / such that 
y = lx (mod p) and so 

jc^l + 1 2 ) = x 2 +y 2 = 0 (mod p). 


It follows that 

1 + l 2 = 0 (mod p) 

and therefore that —1 is a quadratic residue of p, which contradicts 
Theorem 82. 

Theorem 368. Ifp = 4m + 3, p c \n, p c+l \ n, and c is odd, then n has 
no representations ( primitive or imprimitive). 

Suppose that n = x 2 +y 2 , (x,y) = d; and let p Y be the highest power 
of p which divides d. Then 

x = dX, y = dY, (X, Y) - 1, 
n = d 2 (X 2 + Y 2 ) = d 2 N, 

say. The index of the highest power of p which divides N is c — 2y , which 
is positive because c is odd. Hence 

N=X 2 + y\ (X,Y) = 1, P\N- 

which contradicts Theorem 367. 

It remains to prove that n is representable when n is of the form (20.3. 1), 
and it is plainly enough to prove H2 representable. Also 

( x i +Yi) ( x 2 +yl) = (*1*2 +y\n ) 2 + (^1^2 - *2yi) 2 , 

so that the product of two representable numbers is itself representable. 
Since 2 = l 2 +l 2 is representable, the problem is reduced to that of proving 
Theorem 251, i.e. of proving that if p = 4m + 1, then p is representable. 
Since — 1 is a quadratic residue of such a p, there is an / for which 

l 2 = — 1 (mod p). 
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20.3] 

Taking n — I Jp\ in Theorem 36, we see that there are integers a and b 
such that 

1 


0 < b < y/p. 


l a 
P b 


by/p 


If we write 
then 


c = lb + pa. 


|c| < Jp, 0 < b 2 + c 2 < 2 p. 

But c = lb (mod p), and so 

b 2 + c 2 =b 2 + l 2 b 2 =b 2 { 1+ 1 2 ) = 0 (mod p); 


and therefore 

b 2 + c 2 = p. 

20.4. Third and fourth proofs of Theorem 366. (1) Another proof 
of Theorem 366, due (in principle at any rate) to Fermat, is based on the 
‘method of descent’ . To prove that p = 4m + 1 is representable, we prove (i) 
that some multiple of p is representable, and (ii) that the least representable 
multiple of p must be p itself. The rest of the proof is the same. 

By Theorem 86, there are numbers x,y such that 

(20.4.1) x 2 +y 2 = mp, p \ x, p \ y, 

and 0 < m < p. Let mo be the least value of m for which (20.4. 1) is soluble, 
and write mo for m in (20.4.1). If mo = 1, our theorem is proved. 

If mo > 1, then 1 < mo < p. Now mo cannot divide both x andy, since 
this would involve 


m o K * 2 +.y 2 ) -► m o\ m oP -*■ mo\P- 
Hence we can choose c and d so that 

x\ = x - cmo, y\ = y - dm 0 , 

1*1 1 ^ I l^il ^ 3 W 0, *? +y\> 


and therefore 


(20.4.2) 


0 < x 2 +y 2 ^2 (jmo) 2 < m§. 
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Now 

X 2 +yj = X 2 +y 2 = 0 (mod mo) 


or 



(20.4.3) 

x 2 +y 2 = mi mo, 



where 0 < m\ < mo, by (20.4.2). Multiplying (20.4.3) by (20.4.1), with 
m = mo, we obtain 

mlm\p = (x 2 +y 2 ) (x? + y 2 ) = (xxi + yy\) 2 + C^i - x\y) 2 • 


But 


xx\ +yy\ = x (x — cm o) + y(y - dmo ) = moX, 
xyi — xiy — x(y — dmo) —y(x — cmo) = moY, 

where X = p — cx — dy, Y = cy — dx. Hence 

m\p = X 2 + Y 2 (0 < mi < mo), 

which contradicts the definition of mo- It follows that mo must be 1. 

(2) A fourth proof, due to Grace, depends on the ideas of Ch. III. 

By Theorem 82, there is a number l for which 

l 2 + 1 = 0 (mod p). 

We consider the points (x,y) of the fundamental lattice A which satisfy 

y = lx (mod p). 

These points define a lattice M.t It is easy to see that the proportion of points 
of A, in a large circle round the origin, which belong to M is asymptotically 
1 fp, and that the area of a fundamental parallelogram of M is therefore p. 

Suppose that/1 or (£, tj) is one of the points of M nearest to the origin. 
Then ?/ = /£ and so 


= / 2 $ = lr) (mod p), 

and therefore Bor(—r],%) is also a point of M. There is no point of M inside 
the triangle OAB, and therefore none within the square with sides OA, OB. 


t We state the proof shortly, leaving some details to the reader. 
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Hence this square is a fundamental parallelogram of M, and therefore its 
area is p. It follows that 

£ 2 -I- T} 2 = p. 

20.5. The four-square theorem. We pass now to the principal theorem 
of this chapter. 

Theorem 369 (Lagrange’s theorem). Every positive integer is the sum 
of four squares. 

Since 

(20.5.1) 

(x\ +4 +*? +* 5 ) O'? +yl +yi +^4) 

= (x\yi 4- * 2^2 4- *3.y3 4- * 4 .y4) 2 4- (xiy 2 - X 2 yi 4- X 3 y 4 + x 4 y 3) 2 
4- (x\yi - X 3 y\ 4- x 4 y 2 - x ^) 2 4- (x\y 4 - x 4 y\ 4- X 2 y 3 - X 3 y 2 ) 2 , 

the product of two representable numbers is itself representable. Also 1 = 
l 2 4- 0 2 + 0 2 4- 0 2 . Hence Theorem 369 will follow from 

Theorem 370. Any prime p is the sum of four squares. 

Our first proof proceeds on the same lines as the proof of Theorem 366 
in § 20.4 (1). Since 2 = l 2 4- l 2 + 0 2 + 0 2 , we can take p > 2. 

It follows from Theorem 87 that there is a multiple of p, say mp, such 
that 

mp = x\ 4- x\ 4- x\ 4- x$, 

with jti , x 2 , X 3 , x 4 not all divisible by p; and we have to prove that the least 
such multiple of p is p itself. 

Let mop be the least such multiple. If mo = 1, there is nothing more to 
prove; we suppose therefore that mo > 1 . By Theorem 87, mo < p. 

If mo is even, then x\ + x 2 + X 3 + x 4 is even and so either (i) x \ , x 2 , X 3 , 
x 4 , are all even, or (ii) they are all odd, or (iii) two are even and two are 
odd. In the last case, let us suppose that x \ , x 2 are even and X3, x 4 are odd. 
Then in all three cases 

x\ 4- x 2 , x\ — x 2 , X 3 -I- x 4 , X 3 — x 4 

are all even, and so 

1 /*l+*2\ 2 , , /-»:3+*4\ 2 , /'*3-*4\ 2 

“ (— ) + {—) + (—) + {—) 
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is the sum of four integral squares. These squares are not all divisible by 
p, since x\, *2, *3, *4 are not all divisible by p. But this contradicts our 
definition of mo. Hence mo must be odd. 

Next, x\ , X 2 , xj, X 4 , are not all divisible by mo, since this would imply 

ml | mo/? -► molp, 

which is impossible. Also mo is odd, and therefore at least 3. We can 
therefore choose b\, £3, 64 so that 

y t = x, - b,m 0 0 = 1, 2, 3, 4) 

satisfy 

M < jm 0 , y\ +y\ +yj +yl > 0. 

Then 

0 <y\ +y\ +>'3 +yl <4 (jm 0 ) 2 =mj, 
and 

y\ +A +y$ +yl = 0 (mod «o) • 

It follows that 

xf +x$ +xj +x% = mop (mo < p ) , 

y\ +£ +y] +yl = m 0 m \ (0 < mi < mo) ; 

and so, by (20.5.1), 

(20.5.2) mjm \p = z\ + z\ + z\ + z\, 

where z \ , Z 2 , Z 3 , Z 4 are the four numbers which occur on the right-hand side 
of (20.5.1). But 

z \ — Y.xy, = J^,Xj (xi - btmo) = = 0 (mod mo) ; 

and similarly Z 2 , Z 3, 24 are divisible by mo. We may therefore write 

Zi = m 0 ti 0=1, 2, 3, 4); 
and then (20.5.2) becomes 

mip = t\ + 1 \ + 1 \ + tj, 

which contradicts the definition of mo because m\ < mo. 

It follows that mo = 1. 
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20 . 6 . Quaternions. In Ch. XV we deduced Theorem 251 from the 
arithmetic of the Gaussian integers, a subclass of the complex numbers of 
ordinary analysis. There is a proof of Theorem 370 based on ideas which 
are similar, but more sophisticated because we use numbers which do not 
obey all the laws of ordinary algebra. 

Quaternions * are ‘hyper-complex’ numbers of a special kind. The 
numbers of the system are of the form 

(20.6.1) a = ao + a\i\ + <*2*2 + <23/3, 

where ao, a\, 02, 03 are real numbers (the coordinates of a), and 1 1 , *2, 13 
elements characteristic Of the system. Two quaternions are equal if their 
coordinates are equal. 

These numbers are combined according to rules which resemble those of 
ordinary algebra in all respects but one. There are, as in ordinary algebra, 
operations of addition and multiplication. The laws of addition are the same 
as in ordinary algebra; thus 

of + ft = (ao + aiz'i + 0212 + <*3/3) + (60 4 - b\i\ + 62*2 *+■ 63/3) 

= (ao + bo) ± (ai + 61)11 + (02 + 62)12 + (113 + 63)13. 

Multiplication is associative and distributive, but not generally commuta- 
tive. It is commutative for the coordinates, and between the coordinates 
and i\, 12, 13; but 

( 20 . 6 . 2 ) {.. . / .i. ==l 2 =/ 3 = “ 1 > 

I 1213 = 11 = -1312, 13*1 = 12 = -1113. 11*2 = 13 = — 1211- 

Generally, 

( 20 . 6 . 3 ) aft = (ao 4 - a\i\ 4 - a 2 i 2 4 - 03/3) (60 4 - b\i\ -f 62/2 + 6313) 

= CO + C\i\ -I- C2*2 4 - C 3 / 3 , 

where 

co = ao6o — a\b\ — 0262 — 0363, 
c\ = ao6i 4 - aiZ>o 4 - 0263 — 0362, 

C2 = ao^2 — 0163 4 - a2bo 4 - ajb\, 

C3 = 0063 -|- aiZ>2 — 11261 4 - a^bo. 

1 We take the elements of the algebra of quaternions for granted. A reader who knows nothing of 
quaternions, but accepts what is stated here, will be able to follow §§ 20.7-9. 


( 20 . 6 . 4 ) 
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In particular, 

(20.6.5) 

(ao + a\i\ + <22*2 + **3*3) (**0 - 01*1 — **2*2 - **3*3) 

= ** l + aj + al + aj, 

the coefficients of i\, h, *3 in the product being zero. 

We shall say that the quaternion a is integral if ao, a\, 02 , 03 are either 
(i) all rational integers or (ii) all halves of odd rational integers. We are 
interested only in integral quaternions; and henceforth we use ‘quaternion’ 
to mean ‘integral quaternion’. We shall use Greek letters for quaternions, 
except that, when a\ = 02 = 03 = 0 and so a = ao, we shall use ao both 
for the quaternion 

ao + 0 . i\ + 0 . 12 + 0 . z*3 

and for the rational integer ao. 

The quaternion 

(20.6.6) a = ao — aiz'i — ajij ~ <*3*3 

is called the conjugate of a = ao + a\ z'i + 02/2 + **3*3, and 

(20.6.7) Na = aa = aa = Oq 4- a\ + a\ + a* 

the norm of a. The norm of an integral quaternion is a rational integer. We 
shall say that a is odd or even according as Na is odd or even. 

It follows from (20.6.3), (20.6.4), and (20.6.6) that 


or/3 = fia. 


and so 

(20.6.8) N(aP) =aP .aP=aP . Pa=a .NP .a=aa .NP—NaNp. 
We define a -1 , when a / 0, by 

(20.6.9) a-' = SL, 

Na 

so that 


aa 1 = a l a = 1. 


( 20 . 6 . 10 ) 
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If a and a -1 are both integral, then we say that a is a unity, and write 
a = e. Since €€~ l = 1, NeNe~ l = 1 and so Ne = 1. Conversely, if a 
is integral and Na = 1, then a -1 = a is also integral, so that a is a unity. 
Thus a unity may be defined alternatively as an integral quaternion whose 
norm is 1. 

If ao, a\, 02, 03 are all integral, and a\ + of + a\ + a\ = 1, then one of 
Oq,... must be 1 and the rest 0. If they are all halves of odd integers, then 
each of a%, . . . must be Hence there are just 24 unities, viz. 

(20.6.11) ±1, ±i'i, ±h, ±/3, 5 (±1 ± i\ ± h ± h) ■ 

If we write 

(20.6. 12) p = \ (1 + i\ + h + h) , 

then any integral quaternion may be expressed in the form 

(20.6. 13) kop + fcii'i + £2/2 + A:3 1*3 , 

where k o, k\,k 2 , ^3 are rational integers; and any quaternion of this form is 
integral. It is plain that the sum of any two integral quaternions is integral. 
Also, after (20.6.3) and (20.6.4), 

P 2 = \ (- 1+ M + *2 + 13) = P ~ 1, 

pi\ — 5 (-1 + *1 + h - h) = -p + *1 + h, 

hp = 5 (-1+ 1*1 - 1*2 + *3) = -p + m + h, 

with similar expressions for pi'2, etc. Hence all these products are integral, 
and therefore the product of any two integral quaternions is integral. 

If € is any unity, then ea and ate are said to be associates of a. Associates 
have equal norms, and the associates of an integral quaternion are integral. 

If y = a(i, then y is said to have a as a left-hand divisor and ^ as a 
right-hand divisor. If a = ao or P = bo, then a/3 = /3a and the distinction 
of right and left is unnecessary. 

20.7. Preliminary theorems about integral quaternions. Our second 
proof of Theorem 370 is similar in principle to that of Theorem 25 1 
contained in §§ 12.8 and 15.1. We need some preliminary theorems. 

Theorem 371. If a is an integral quaternion, then one at least of its 
associates has integral coordinates; and if a is odd, then one at least of its 
associates has non-integral coordinates. 
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(1) If the coordinates of a itself are not integral, then we can choose the 
signs so that 

of = (bo + b\i\ + £212 + £3*3) + 5(^=1 ± i'i ± h ± 13) = P + y, 

say, where bo, b\, 62, 63 are even. Any associate of has integral coordi- 
nates, and yy; an associate of y, is 1. Hence ay, an associate of a, has 
integral coordinates. 

(2) If a is odd, and has integral coordinates, then 

of = (ho 4- b\i\ -I- b2h + ^> 3 *3 ) + (co + c\i\ -I- C 2/2 4- C 3 / 3 ) = fi + y, 

say, where bo, b\ , b 2 , 63 are even, each of co, c\ , C2, C3 is 0 or 1 , and (since 
Not is odd) either one is 1 or three are. Any associate of /J has integral 
coordinates. It is therefore sufficient to prove that each of the quaternions 

1, l’l, h, h, l+*2+*3» l+M+*3> l+M+*2» *'l+*2+*3 

has an associate with non-integral coordinates, and this is easily verified. 
Thus, if y = i\ then yp has non-integral coordinates. If 

y = 1 + 1*2 + z'3 = (1 + 11 + h + h) - M = A + ju. 

or 

y = i'i + h + h = (1 + h + h + 13 ) — 1 = A. + jt, 

then 

ke = A. . |(1 — /‘i — 12 — 13). — 2 
and the coordinates of pe are non-integral. 

Theorem 372. If k is an integral quaternion, and m a positive integer, 
then there is an integral quaternion A such that 

N(k — m A.) < m 2 . 

The case m = 1 is trivial, and we may suppose m > 1. We use the form 
(20.6.13) of an integral quaternion, and write 

k = kop + k\i\ + k2h + ^3/3, A. = lop + hi\ + hh + hh, 

where ko, ...» lo, ... are integers. The coordinates of k — mk are 

5(^0 - mlo), \{ko + 2 k\ - m(/o -I- 2 l\)}, \{ko + 2^2 - m(lo + 2/2)}, 

\{ko + 2k2 — m(lo + 2/3)}. 
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We can choose lo, l\ , h, h in succession so that these have absolute values 
not exceeding \m, \m, j/n, \m\ and then 

N(k — mk) ^ "I" 3 . ^nt 2 < m 2 . 

Theorem 373. If a and fi are integral quaternions , and # 0, then 
there are integral quaternions X and y such that 

a = kfi + y, Ny < N0. 

We take 

K=afi, m = = N fi, 

and determine X as in Theorem 372. Then 


(a — k/3)f$ — k — km — k — mk, 

N(a — kP)NP = N(k — mk) < m 2 , 

Ny = N(a — kfi) < m = Nfi. 

20.8. The highest common right-hand divisor of two quaternions. 

We shall say that two integral quaternions a and ft have a highest common 
right-hand divisor 8 if (i) 8 is a right-hand divisor of a and fi, and (ii) every 
right-hand divisor of a and fi is a right-hand divisor of 5; and we shall prove 
that any two integral quaternions, not both 0, have a highest common right- 
hand divisor which is effectively unique. We could use Theorem 373 for 
the construction of a ‘Euclidean algorithm’ similar to those of §§ 12.3 and 
12.8, but it is simpler to use ideas like those of §§ 2.9 and 15.7. 

We call a system S of integral quaternions, one of which is not 0, a 
right-ideal if it has the properties 

(i) a g S . fieS — ► a ± P € S, 

(ii) a € S -► Xa € S for all integral quaternions X: 

the latter property corresponds to the characteristic property of the ideals 
of § 15.7. If 8 is any integral quaternion, and S is the set (X<$) of all left- 
hand multiples of 8 by integral quaternions X, then it is plain that S is a 
right-ideal. We call such a right-ideal a principal right-ideal. 

Theorem 374. Every right-ideal is a principal right-ideal. 

Among the members of S, not 0, there are some with minimum norm: 
we call one of these 5. If yeS,Ny < N8 then y = 0. 
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If a e S then a — kSeS, for every integral k, by (i) and (ii). By The- 
orem 373, we can choose k so that Ny = N(a — k8 ) < NS. But then 
y = 0,a = kS, and so S is the principal right-ideal (kS). 

We can now prove 

Theorem 375. Any two integral quaternions a and ft, not both 0, have a 
highest common right-hand divisorS, which is unique except for a left-hand 
unit factor, and can be expressed in the form 

(20.8.1) S = pa + vft, 

where p and v are integral. 

The set S of all quaternions pa + vft is plainly a right-ideal which, by 
Theorem 374, is the principal right-ideal formed by all integral multiples 
kS of a certain 8. Since S includes 5, S can be expressed in the form (20.8. 1). 
Since S includes a and ft, 8 is a common right-hand divisor of a and ft\ 
and any such divisor is a right-hand divisor of every member of S, and 
therefore of 8. Hence 8 is a highest common right-hand divisor of a and ft. 

Finally, if both 8 and 8' satisfy the conditions, 8' = k8 and 8 = k'8', 
where k and k' are integral. Hence 8 = k'kS, 1 = k'k, and k and k' are 
unities. 

If 8 is a unity e, then all highest common right-hand divisors of a and ft 
are unities. In this case 

p'a + v’ ft = e, 

for some integral p', v'; and 

(e~ V) a + (<rV)/3 = 1; 


so that 

(20.8.2) pa + vft = l, 
for some integral p, v. We then write 

(20.8.3) (a,ft) r = 1. 

We could of course establish a similar theory of the highest common 
left-hand divisor. 

If a and ft have a common right-hand divisor 8, not a unity, then Na and 
N ft have the common right-hand divisor N8 > 1 . There is one important 
case in which the converse is true. 
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Theorem 376. If a is integral and fi = m,a positive rational integer, then 
a necessary and sufficient condition that (a, fi) r = 1 is that (Na,Nfi) = 1, 
or ( what is the same thing ) that (N a, m) = 1. 

For if (a, 0) r = 1 then (20.8.2) is true for appropriate p, v. Hence 

N(pa ) = N( 1 — vfi) = (1 — mv)(l — mv), 

NpNa = 1 — mv — mv + m 2 Nv, 

and ( Na,m ) divides every term in this equation except 1. Hence 
(Na, m)= 1 . Since Nfi = m 2 , the two forms of the condition are equivalent. 

20.9. Prime quaternions and the proof of Theorem 370. An integral 
quaternion n, not a unity, is said to be prime if its only divisors are the 
unities and its associates, i.e. if n = afi implies that either a or fi is a 
unity. It is plain that all associates of a prime are prime. If n = af}, then 
Nn = NaNfi, so that n is certainly prime if Nn is a rational prime. We 
shall prove that the converse is also true. 

Theorem 377. An integral quaternion n is prime if and only if its norm 
Njt is a rational prime. 

Since Np = p 2 , a particular case of Theorem 377 is 

Theorem 378. A rational prime p cannot be a prime quaternion. 

We begin by proving Theorem 378 (which is all that we shall actually 
need). 

Since 

2 = (1+/i)(1-m), 

2 is not a prime quaternion. We may therefore suppose p odd. 

By Theorem 87, there are integers r and s such that 

0 < r < p, 0 < s < p, I+^+^hO (mod p). 


If 


a = 1 + Ji2 — n'3, 


then 


Na = 1 + r 2 + s 2 = 0 (mod p), 

and (Na,p) > 1 . It follows, by Theorem 376, that a and p have a common 
right-hand divisor S which is not a unity. If 


a = (Si5, p = 52^, 
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then 52 is not a unity; for if it were then 5 would be an associate of p, in 
which case p would divide all the coordinates of 

a = 5i5 — 5 i5 2 ^ p , 

and in particular 1. Hence p = 525, where neither 5 nor 52 is a unity, and 
so p is not prime. 

To complete the proof of Theorem 377, suppose that n is prime and p a 
rational prime divisor of Nn. By Theorem 376, it and p have a common 
right-hand divisor it' which is not a unity. Since n is prime, n' is an 
associate of n and Ntz' = Nn. Also p = Xn' , where X is integral; and 
p 2 = NXNn' = NXNn, so that NX is 1 or p. If NX were 1, /> would be an 
associate of n' and n, and so a prime quaternion, which we have seen to 
be impossible. Hence Nn = p, a rational prime. 

It is now easy to prove Theorem 370. If p is any rational prime,/? = Xn, 
where NX = Nn = p. If n has integral coordinates ao, a\, a 2 , a 2 , then 

p = Nn = Oq + a 2 + a 2 + a 2 . 

If not then, by Theorem 371, there is an associate n' oin which has integral 
coordinates. Since 

p = Nn = Nn', 
the conclusion follows as before. 

The analysis of the preceding sections may be developed so as to lead 
to a complete theory of the factorization of integral quaternions and of the 
representation of rational integers by sums of four squares. In particular it 
leads to formulae for the number of representations, analogous to those of 
§§ 1 6.9-10. We shall prove these formulae by a different method in § 20. 12, 
and shall not pursue the arithmetic of quaternions further here. There is 
however one other interesting theorem which is an immediate consequence 
of our analysis. If we suppose p odd, and select an associate n' of n whose 
coordinates are halves of odd integers (as we may by Theorem 371), then 

p = Nn= Nn' = ( b 0 + \) 2 + {b x + \) 2 + (b 2 + \) 2 + (b 2 + \) 2 , 
where bo,... are integers, and 

4 p = (2bo + l) 2 + (2b\ + l) 2 + (2 b 2 + l) 2 + (2 b 2 + l) 2 . 


Hence we obtain 
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Theorem 379. Ifp is an odd prime, then 4 p is the sum of four odd integral 
squares. 

Thus 4.3= 12 = l 2 + l 2 + l 2 + 3 2 (but 4 . 2 = 8 is not the sum of 
four odd integral squares). 

20.10. The values of g(2) and (7(2). Theorem 369 shows that 

(7(2) < g(2) ^ 4. 

On the other hand, 

(2m) 2 = 0 (mod 4), (2m + l) 2 = 1 (mod 8), 

so that 

x 2 = 0, 1, or 4 (mod 8) 
and 

x 2 +y 2 +z 2 ^ 7 (mod 8). 

Hence no number 8m + 7 is representable by three squares, and we obtain 
Theorem 380: 

g( 2) = (7(2) = 4. 

If x 2 + y 2 + z 2 = 0 (mod 4), then all of x,y,z are even, and 

\ ( x 2 +y 2 + z 2 ) = (|x) 2 + (\y) 2 + (\z) 2 

is representable by three squares. It follows that no number 4 a (8m+7) is 
the sum of three squares. It can be proved that any number not of this form 
is the sum of three squares, so that 

n 4 a (8m + 7) 

is a necessary and sufficient condition for n to be representable by three 
squares; but the proof depends upon the theory of ternary quadratic forms 
and cannot be included here. 
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20.11. Lemmas for the third proof of Theorem 369. Our third proof 
of Theorem 369 is of a quite different kind and, although ‘elementary’, 
belongs properly to the theory of elliptic functions. 

The coefficient r^{n) of x n in 

( oo 

x m2 

m=—oo 

is the number of solutions of 

n = m i -f- m\ + m\ + m\ 

in rational integers, solutions differing only in the sign or order of the m 
being reckoned as distinct. We have to prove that this coefficient is positive 
for every n. 

By Theorem 312 


(1+ 2x + 2x 4 + • • • ) 2 




and we proceed to find a transformation of the square of the right-hand 
side. 

In what follows x is any number, real or complex, for which |jc| < l.The 
series which we use, whether simple or multiple, are absolutely convergent 
for |jc| < 1 . The rearrangements to which we subject them are all justified 
by the theorem that any absolutely convergent series, simple or multiple, 
may be summed in any manner we please. 

We write 


so that 


a -*o 2 


— H r ( 1 + W r ). 


We require two preliminary lemmas. 


00 oo 

1 +u m ) = y ~^nu n . 

m= 1 n= 1 


Theorem 381: 
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oo 


E 


x" 

0 -x m ) 2 


oo oo 


= EE~"” 


OO OO oo 

-E-E^E 

n = 1 m =\ n = 1 


nx n 

\-x n ' 


Theorem 382: 

OO 00 

r. (-l) W_1 «2iw(l + «2m) = 52 (2n - l)M4n-2. 
m=l n=l 


For 


oo 


(-l) m - l x 2m 


OO 


oo 


^1 (1 


x 2m ) 2 


00 2r 

nr r 


= E<-"'”“ 1 E'* 2 ” r 

m— 1 r=l 

oo oo 

= E'Ec-»”-'^ = E t+ ^ 

r=l m=l r=l 

_ y> / rx 2r _ 2rx 4r \ _ (2n-l)x 4 ' 

2-J _ x 2 r 1 _ ^ 4 rj 2 -j \ _ ^ n- 


An -2 
2 ' 


20.12. Third proof of Theorem 369: the number of representations. 

We begin by proving an identity more general than the actual one we need. 

Theorem 383. If 9 is real and not an even multiple of it, and if 

L = L(x, 6) = | cot ^0 + u\ sin 9 + u 2 sin 29 H , 

T\ = T\(x,9) = (jcot^fl) 2 + ui(l +«i)cos0 
+ M 2 (l + uf) cos 2 9 + • • • , 

72 = T 2 (x,9 ) = 5 {«i(l — cos0) + 2«2(1 — cos 20) 

+ 3«2(1 — cos 30) H }, 


L 2 = T\ + T 2 . 


then 
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We have 

00 

L 2 — ■ \ cot 4- ^2 u n sinn0 
«=i 

00 OO OO 

= (1 cot 3^) 2 + 3 5Z Un cot 2^ sin ”0 + S UmUn sin mi 9 sin nd 

n—\ m=l /i=l 

= ( j cot 5$) + <Si 4- <S2, 
say. We now use the identities 

j cot j0 sin n0 = ^ 4- cos 0 4- cos 20 H 4- cos(/i — 1)0 4- 5 cos «0, 

2 sin m0 sin «0 — cos(/n — n)0 — cos (m 4- n)0, 

which give 

00 

51 = {5 + cos0 + cos 20 H hcos(n - 1)0 4- jcosn0}, 

n=l 

OO OO 

5 2 = i y; u m u,,{cos(m — n)0 — cos ( m 4- n)0). 

m= 1 n—\ 

and 

OO 

L 2 = (J cot ±0) 2 + Co 4- ]T C* cos k0, 

k=i 

say, on rearranging S\ and S 2 as series of cosines of multiples of 0.t 



t To justify this rearrangement we have to prove that 
00 

£ |«/i| (2 + |cos0| + ••• + ||cosn0|) 
n=l 

and 

00 00 

£ ^ |Mml|Mnl(|cos(m + n)0\ -I- |cos(m — n)0|) 

m=l n=\ 

are convergent. But this is an immediate consequence of the absolute convergence of 

00 00 00 

^2 nu n> EE u m u n - 

n=l m— 1 1 
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We consider Co first. This coefficient includes a contribution j u n 

l 

00 

from S \ , and a contribution j £ u* from the terms of £2 for which m = n. 

1 

Hence 

00 00 

Q> = 3 ^ (Mn + w «) == 5 ^ 

/i=l «= 1 

by Theorem 381. 

Now suppose & > 0. Then 5i contributes 

OO oo 

= \ u k + 

n=k + 1 /=1 

to C k> while 52 contributes 

2 ^ ! u m u n "+■ 5 ^ 1 u m u n J ^ 

m~n—k n—m=k m+n=k 

where m ^ 1,« ^ 1 in each summation. Hence 

oo oo k— 1 

c* = 5“A + 5Z “*+' + ^ _ 5 5Z • 

/=1 /=1 /=1 

The reader will easily verify that 

um-i = + ui + u k -i) 


and 

Hence 


w*+l + UlUk+l = Uk(ui - Uk+l). 


oo k — 1 

Q = MJt { 2 + - MJt+j) - +M1 +“*-/)} 

/=1 /=1 J 

= «* {2 + M 1 + M 2 + b K* — — 1) — (wi + «2 + 1- M*-l)} 

= u k (\+u k ~\k). 
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and so 

oo oo 

L 2 = (| cot jO) 2 + 5 ^2 nu„ + ^ Uk (l + w* - \k) cos kO 

n=\ k— 1 

00 00 

= (| cot j0) 2 + ■+■ Uk) cos k9 + j^^kuid 1 — cosAtf) 

*=1 k= 1 

= T l (x,e) + T 2 (x,0). 

Theorem 384: 

(\ + ui-u 3 + u 5 -m-\ — ) 2 

= + j(wi 4- 2u 2 + 3«3 + 5«5 + 6«6 + 2u-j + 9m9 + •••), 

where in the last series there are no terms in 114, us, u\ 2 > 

We put 9 — jit in Theorem 383. Then we have 

00 

Tl = ^ - Y, (-l)"“Wl + «2m), 

m= 1 

00 00 

T l = \ ( 2m — + 2 ^2 (2m - l)« 4m _2- 

m=\ m—l 

Now, by Theorem 382, 

OO 

T ' = T6 ~ ^2 ( 2m ~ l)«4m-2, 
m=\ 

and so 

T\ + T 2 = + j(«i + 2 i/2 4- 3i/3 4- 5«5 + •••). 

From Theorems 312 and 384 we deduce 

Theorem 385: 

(1 4- 2x + 2x 4 4- 2x 9 + • • -) 4 = 1 4- 8 J2' mu "» 
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20.12(386)] 


where m runs through all positive integral values which are not multiples 
of 4. 


Finally, 


= 8 ^' = Z'£ m jr,x mr = 

r= 1 n= 1 


where 

c n = m 

m\n,4]m 

is the sum of the divisors of n which are not multiples of 4. 

It is plain that c„ > 0 for all n > 0, and so r 4 </i) > 0. This provides us 
with another proof of Theorem 369; and we have also proved 

Theorem 386. The number of representations of a positive integer n as 
the sum of four squares, representations which differ only in order or sign 
being counted as distinct, is 8 times the sum of the divisors of n which are 
not multiples of 4. 

20.13. Representations by a larger number of squares. There are 
similar formulae for the numbers of representations of n by 6 or 8 squares. 
Thus 

r 6 (n) = 16 x ( d')d 2 - 4 ^ x 0 d)d l 2 , 

d\n d\n 

where dd' = n and x(d), as in § 16.9, is 1, —1, or 0 according as d is 
4k + 1, 4fc — 1, or 2k; and 

r 8 (/i) = 16(-l)' , ^(-l)^ 3 . 

d\n 


These formulae are the arithmetical equivalents of the identities 


(1 + 2x + 2x 4 + • • ) 6 = 1 + 16 ( 




-L 


2 2 x 2 3V 



-4 


( 


l 2 x 

1 -x 


3 2 x 3 5 2 x s 
l-X 3 + l-*5 




416 


THE REPRESENTATION OF A NUMBER BY 


[Chap. XX 


and 

48 / l 3 * 2 3 x 2 3 3 x 3 \ 

(1 + 2x + 2x 4 + ---)* = 1 + 16 ( 7 — + - 2 +TT1 + •••)• 

\ 1 + x 1 - x 2 1 + / 

These identities also can be proved in an elementary manner, but have their 
roots in the theory of the elliptic modular functions. That r^(n) and rg(n) 
are positive for all n is trivial after Theorem 369. 

TTie formulae for r s (n), where s = 10, 12, , involve other arithmetical 
functions of a more recondite type. Thus r\o(n) involves sums of powers 
of the complex divisors of n. 

The corresponding problems for representations of n by sums of an odd 
number of squares are more difficult, as may be inferred from § 20.10. 
When 5 is 3, 5, or 7 the number of representations is expressible as a finite 
sum involving the symbol (™) of Legendre and Jacobi. 


NOTES 

§ 20.1. Waring made his assertion in Meditationes algebraicae (1770), 204-5, and 
Lagrange proved that g( 2) = 4 later in the same year. There is an exhaustive account of 
the history of the four-square theorem in Dickson, History, ii, ch. viii. 

Hilbert’s proof of the existence of g(k) for every k was published in Gottinger 
Nachrichten (1909), 17-36, and Math. Annalen, 67 (1909), 281-305. Previous writers 
had proved its existence when k = 3, 4, 5, 6, 7, 8, and 10, but its value had been determined 
only for k = 3. The value of g(k) is now known for all k: that of G(k) for k = 2 and 
k — 4 only. The determinations of g{k) rest on a previous determination of an upper bound 
for G(k). 

See also Dickson, History, ii, ch. 25, and our notes on Ch. XXI. 

Lord Saltoun drew my attention to an error on p. 394. 

§ 20.3. This proof is due to Hermite, Journal de math. (1), 13 (1848), 15 ( (Euvres , 
i. 264). 

§ 20.4. The fourth proof is due to Grace, Journal London Math. Soc. 2 (1927), 3-8. 
Grace also gives a proof of Theorem 369 based on simple properties of four-dimensional 
lattices. 

§ 20.5. Bachet enunciated Theorem 369 in 1621, though he did not profess to have 
proved it. The proof in this section is substantially Euler’s. 

§§ 20.6-9. These sections are based on Hurwitz, Vorlesungen ttber die Zahlentheorie 
der Quatemionen (Berlin, 1919). Hurwitz develops the theory in much greater detail, and 
uses it to find the formulae of § 20.12. We go so far only as is necessary for the proof of 
Theorem 370; we do not, for example, prove any general theorem concerning uniqueness 
. of factorization. There is another account of Hurwitz’s theory, with generalizations, in 
Dickson, Algebren und ihre Zahlentheorie (Zurich, 1927), ch. 9. 

Lipschitz ( Untersuchungen uber die Summen von Quadrat, Bonn, 1886) was the first 
to develop and publish an arithmetic of quaternions, though Hamilton, the inventor of 
quaternions, gave the same method in an unpublished letter in 1 856 (see The Mathematical 
papers of Sir. Wm. R. Hamilton (ed. Halberstam and Ingram), xviii and Appendix 4). 
Lipschitz (like Hamilton) defines an integral quaternion in the most obvious manner, viz. 



Notes] 


TWO OR FOUR SQUARES 


417 


as one with integral coordinates, but his theory is much more complicated than Hurwitz’s. 
Later, Dickson [Proa London Math . Soa (2) 20 (1922), 225-32] worked out an alternative 
and much simpler theory based on Lipschitz’s definition. We followed this theory in our 
first edition, but it is less satisfactory than Hurwitz’s: it is not true, for example, in Dickson’s 
theory, that any two integral quaternions have a highest common right-hand divisor. 

§ 20.10. The ‘three-square theorem’, which we do not prove, is due to Legendre, 
Essai sur la theorie des nombres (1798), 202, 398-9, and Gauss, DA ., § 291. Gauss 
determined the number of representations. See Landau, Vorlesungen, i. 1 14—25. There is a 
proof, depending on the methods of Liouville, referred to in the note on § 20.13 below, in 
Uspensky and Heaslet, 465-74 and another proof, due to Ankeny (Proa American Math . 
Soc. 8 (1957), 316-19) depending only on Minkowski’s theorem (our Theorem 447) and 
Dirichlet’s theorem (our Theorem 15). 

§§ 20.1 1-12. Ramanujan, Collected papers , 138 et seq. 

§ 20.13. The results for 6 and 8 squares are due to Jacobi, and are contained implicitly 
in the formulae of §§ 40-42 of the Fundamenta nova. They are stated explicitly in Smith’s 
Report on the theory of numbers (Collected papers , i. 306-7). Liouville gave formulae for 
12 and 10 squares in the Journal de math. (2)9(1864), 296-8, and 11 (1866), 1-8. Glaisher, 
Proa London Math. Soc. (2) 5 (1907), 479-90, gave a systematic table of formulae for 
r 2 s(n ) up to 2s = 18, based on previous work published in vols. 36-39 of the Quarterly 
Journal of Math. The formulae for 14 and 18 squares contain functions defined only as 
the coefficients in certain modular functions and not arithmetically. Ramanujan (Collected 
papers , no. 18) continues Glaisher’s table up to 2s = 24. 

Boulyguine, in 1914, found general formulae for r 2 s (n) in which every function which 
occurs has an arithmetical definition. Thus the formula for r 2 s(n ) contains functions 
52 4>(x \ , *2, • . . ,*t), where 0 is a polynomial, t has one of the values 2s — 8, 2s — 16, ... , 

and the summation is over all solutions of xf + x% H h xj = n. There are references to 

Boulyguine ’s work in Dickson’s History , ii. 317. 

Uspensky developed the elementary methods which seem to have been used by Liouville 
in a series of papers published in Russian: references will be found in a later paper in Trans. 
Amer. Math. Soc. 30 (1928), 385-404. He carries his analysis up to 2s = 12, and states that 
his methods enable him to prove Boulyguine’s general formulae. 

A more analytic method, applicable also to representations by an odd number of squares, 
has been developed by Hardy, Mordell, and Ramanujan. See Hardy, Trans. Amer. Math. Soc. 
21 (1920), 255-84, and Ramanujan , ch. 9; Mordell, Quarterly Journal of Math. 48 (1920), 
93-104, and Trans. Camb. Phil. Soc. 22 (1923), 361-72; Estermann, Acta arithmetica , 2 
(1936), 47-79; and nos. 18 and 21 of Ramanujan’s Collected papers. 

We defined Legendre’s symbol in § 6.5. Jacobi’s generalization is defined in the more 
systematic treatises, e.g. in Landau, Vorlesungen , i. 47. 

Self-contained formulae for the number of representations of a positive integer as the 
sum of squares are nowadays seen to be explained by the theory of modular forms (see, for 
example. Chapter 11 of H. Iwaniec, Topics in classical automorphic forms , Amer. Math. 
Soc., 1997). Indeed one may consider positive-definite quadratic forms 


n 

Q(x\,...,x n ) = ^2 a ij x i x j {oij = aji integers) 

ij=\ 

in complete generality by such methods. 

An elegant result for such forms has been proved by Conway and Schneeberger (unpub- 
lished). This states that if Q represents every positive integer up to and including 15, 
then it represent all positive inttegers. One cannot reduce the number 15, since in fact 
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x? + 2*2 + 5x| + 5x% represents all positive integers except 1 5. A more difficult version 
of this result has been established by Bhargava ( Quadratic forms and their applications 
(Dublin, 1999), 27-37, Contemp. Math., 272, Amer. Math. Soc., Providence, RI, 2000), 
referring to forms 


Q(x i x„)= oyxiXj (ay integers) . 


In this case, if every integer up to 290 is represented then all integers are represented. 
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*21.1. Biquadrates. We defined ‘ Waring ’s problem’ in § 20.1 as the 
problem of determining g(k ) and G{k), and solved it completely when 
k = 2. The general problem is much more difficult. Even the proof of 
the existence of g(k) and G(k) requires quite elaborate analysis; and the 
value of G(k) is not known for any k but 2 and 4. We give a summary of 
the present state of knowledge at the end of the chapter, but we shall prove 
only a few special theorems, and these usually not the best of their kind 
that are known. 

It is easy to prove the existence ofg(4). 

Theorem 387. g{ 4) exists, and does not exceed 50. 

The proof depends on Theorem 369 and the identity 

(21.1.1) 6 (a 2 + b 2 + c 2 + d 2 ) 2 = (a + b ) 4 + (a - b ) 4 + (c + d ) 4 

+ (c — d) 4 + (a + c) 4 + (a — c) 4 
+ (b + d) 4 + (b- d) 4 + (a + d) 4 
+ (a~ d) 4 + (b + c) 4 + (b — c) 4 . 

We denote by B s a number which is the sum of s or fewer biquadrates. 
Thus (21.1.1) shows that 

6 (a 2 + b 2 +c 2 + d 2 ) 2 = B\ 2 , 
and therefore, after Theorem 369, that 

(21.1.2) 6 x 2 = B l2 , 
for every x. 

Now any positive integer n is of the form 

n — 6 N + r, 

where N ^ 0 and r is 0, 1,2, 3, 4, or 5. Hence (again by Theorem 369) 

n — 6 (xj + xf + x 2 + xl) + r; 
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and therefore, by (2 1 . 1 .2), 

n = #12 +#12 + #12 +#12 + r = #48 +r = #53 

(since r is expressible by at most 5 l’s). Hence g( 4) exists and is at 
most 53. 

It is easy to improve this result a little. Any n ^ 81 is expressible as 

n = 6N + t, 

where N ^ 0, and t = 0, 1,2,81, 16, or 17, according as w = 0, 1,2, 3,4, 
or 5 (mod 6). But 

1 = l 4 , 2 = l 4 + l 4 , 81 = 3 4 , 16 = 2 4 , 17 = 2 4 + l 4 . 

Hence t = #2, and therefore 

n — #48 + #2 — #50, 


so that any n ^ 81 is #50. 

On the other hand it is easily verified that n = #19 if 1 ^ n ^ 80. 
In fact only 

79 = 4.2 4 + 15. I 4 


requires 19 biquadrates. 

21.2. Cubes: the existence of (7(3) andg(3). The proof of the existence 
ofg(3) is more sophisticated (as is natural because a cube may be negative). 
We prove first 

Theorem 388: 


GO) ^ 13. 

We denote by C s a number which is the sum of s non-negative cubes. 
We suppose that z runs through the values 7, 13, 19, . . . congruent to 
1 (mod 6), and that I z is the interval 

<t>(z) = Hz 9 + (z 3 + l) 3 + 125z 3 ^ n ^ 14z 9 = yjr{z)- 

It is plain that 0 (z + 6) < ^ (z) for large z, so that the intervals I z ultimately 
overlap, and every large n lies in some I z . It is therefore sufficient to prove 
that every n of l z is the sum of 13 non-negative cubes. 
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We prove that any n of I z can be expressed in the form 

(21.2.1) n=N + Sz 9 + 6 mz 3 , 
where 

(21.2.2) N = C 5 , 0 <m<z 6 . 

We shall then have 

m = x\ +*2 +*3 + -*4» 
where 0 ^ x, < z 3 ; and so 

n = N 4- 8z 9 4- 6z 3 (x\ 4 -x\+x\+ x\) 

= N + { (z 3 + Xi ) 3 4- (z 3 - Xi ) 3 } 

»=i 

= C 5 4- Cs = Co. 

It remains to prove (21 .2.1). We define r, s , and N by 

n = 6 r (mod z 3 ) (l^r<z 3 ), 

n = s + 4 (mod 6) (0 ^ s < 5), 

N = (r 4- l) 3 + (r - l) 3 + 2(z 3 - r) 3 + (sz) 3 . 

Then N = C 5 and 

0 < N < (z 3 + l) 3 + 3 z 9 + 125z 3 = <p(z) - 8z 9 ^ n - 8 z 9 , 
so that 

(21.2.3) 8z 9 < n - N < 14z 9 . 

Now 

N = (r + l) 3 + (r- l) 3 - 2r 3 = 6 r = n = n - 8 z 9 (modz 3 ). 
Also x 3 — x (mod 6) for every X, and so 

N = r + l + r — 14- 2(z 3 — r) + sz = 2z 3 + sz 
= (2 + s)z = 2 + s = n — 2 
= n — 8 s n — 8z 9 (mod 6). 
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Hence n — N—Sz 9 is a multiple of 6z 3 . This proves (21.2.1), and the 
inequality in (21.2.2) follows from (21.2.3). 

The existence of g( 3) is a corollary of Theorem 388. It is however 
interesting to show that the bound for (7(3) stated in the theorem is also a 
bound for g( 3). 

21 J. A bound for g(3). We must begin by proving a sharpened form 
of Theorem 388, with a definite limit beyond which all numbers are C 13. 

Theorem 389. Tjf/i ^ 10 25 , then n = C13. 

We prove first that <f>(z + 6) ^ if z ^ 373, or that 

Ut 9 + (f 3 + l) 3 + 125r 3 14 (/ - 6) 9 , 


i.e. 


(21.3.1) 




128 128 
+ 1 *~ + 


if / ^ 379. Now 


(1 - S) m > 1 - mS 

if 0 < 5 < 1 . Hence 

(-!)’>-? 

if / > 6; and so (21.3.1) is satisfied if 


or if 




2(t - 7 . 54) > + 


128 1 


This is clearly true if / ^ 7 . 54 + 1 = 379. 

It follows that the intervals I z overlap from z = 373 onwards, and n 
certainly lies in an I 2 if 


which is less than 10 25 . 


n > 14(373) 9 , 
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We have now to consider representations of numbers less than 10 25 . It 
is known from tables that all numbers up to 40000 are C 9 , and that, among 
these numbers, only 23 and 239 require as many cubes as 9. 

Hence 


n = C 9 (1 n < 239), n - C 8 (240 < n < 40000). 
Next, UN ^ 1 and m = jyVJ j , we have 

N-m 3 = (W5) 3 - m 3 ^ 3Ni(N? - m) < 3 JV§. 
Now let us suppose that 

240 ^ n 10 25 

and put n = 240 + N, 0 ^ N < 10 25 . 

Then 

N = m 2 +N u m = [iV3], 0^W,<3W3, 

Ni = m\ + N 2 , mi = [W, 3 ], 0 ^ N 2 < 3 N 3 , 


Hence 


N4 = m\ + Ns, 



(21.3.2) 

Here 


n = 240 + N = 240 + N$ + m 3 + m\ + m\ + m\ + m\. 



< 35000. 
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Hence 


240 ^ 240 + N 5 < 35240 < 40000, 

and so 240 + N 5 is Cg; and therefore, by (21.3.2), n is C13. Hence all 
positive integers are sums of 13 cubes. 

Theorem 390: 


g( 3) ^ 13. 

The true value of g(3) is 9, but the proof of this demands Legendre’s 
theorem (§ 20.10) on the representation of numbers by sums of three 
squares. We have not proved this theorem and are compelled to use Theo- 
rem 369 instead, and it is this which accounts for the imperfection of our 
result. 

21.4. Higher powers. In § 2 1 . 1 we used the identity (2 1 . 1 . 1 ) to deduce 
the existence of g( 4) from that of g(2). There are similar identities which 
enable us to deduce the existence of g(6) and g(8) from that of g(3) and 
g(4). Thus 

(21.4.1) 60 (a 2 + b 2 + c 2 + d 2 ) 3 = £(a ± b ± cf 

+ 2^(a±fc) 6 + 36^a 6 . 

On the right there are 


16 + 2. 12 + 36.4= 184 


sixth powers. Now any n is of the form 

60 N + r (0 ^ r < 59); 


and 


g( 3) g(3) 

60N = 60 £ X? = 60 ^ (aj + 6, 2 + cj + dff , 

1=1 1=1 

which, by (21.4.1), is the sum of 184g(3) sixth powers. Hence n is the 
sum of 


184g(3)+r ^ 184g(3) + 59 
sixth powers; and so, by Theorem 390, 
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Theorem 391: 

g(6)<184g(3) + 59^ 2451. 

Again, the identity 

(21 .4.2) 5040(a 2 + b 2 + c 2 + d 2 ) 4 

= 6£(2u) 8 + 60£(a±b) 8 

4- ] (2 a ± b ± c) 8 + 6 ^ ^ (a ± b ± c zfc J) 8 

has 

6 . 4 + 60 . 12 + 48 + 6 . 8 = 840 

eighth powers on its right-hand side. Hence, as above, any number 5040JV 
is the sum of 840g(4) eighth powers. Now any number up to 5039 is the 
sum of at most 273 eighth powers of 1 or 2.* Hence, by Theorem 387, 

Theorem 392: 


g(8) ^ 540g(4) + 273 < 42273. 

The results of Theorems 391 and 392 are, numerically, very poor; and 
the theorems are really interesting only as existence theorems. It is known 
that g(6) = 73 and that g(8) = 279. 

21.5. A lower bound for g(k). We have found upper bounds for g(k), 
and a fortiori for G(k), for k — 3, 4, 6, and 8, but they are a good deal 
larger than those given by deeper methods. There is also the problem of 
finding lower bounds, and here elementary methods are relatively much 
more effective. It is indeed quite easy to prove all that is known at present. 

We begin with g(k). Let us write q = [(§)*] • The number 

n = 2 k q-\ <3* 

can only be represented by the powers 1* and 2 k . In fact 

n = (q- 1)2* + (2* - 1)1*, 


t The worst number is 4863 = 18 . 2* + 255 . I 8 . 



426 


REPRESENTATION BY CUBES AND 


[Chap. XXI 


and so n requires just 

q-\+2 k -\=2 k + q-2 

&th powers. Hence 
Theorem 393: 

g(k)>2 k +q-2. 

In particular g(2) ^ 4, g( 3) ^ 9, g(4) ^ 19, g(5) ^ 37, It is 

known that g(k) = 2 k + q — 2 for all values of k up to 400 except perhaps 
4 and 5, and it is quite likely that this is true for every k. 

21.6. Lower bounds for G(k). Passing to G(k), we prove first a general 
theorem for every k. 

Theorem 394: 

G(k ) ^ k + 1 for k ^ 2. 

Let A (N) be the number of numbers n < N which are representable in 
the form 

(21.6.1) n = x/[ + x k 4- • • • + x%, 

where x, ^ 0. We may suppose the jc, arranged in ascending order of 
magnitude, so that 

(21.6.2) 0 < x\ < X 2 ^ • • • < Xk < N l / k . 

Hence A(N) does not exceed the number of solutions of the inequalities 

(21.6.2) , which is 

[W 1 /*] JC* JC*— 1 JC2 

B(N)= £ jr E -E 1 - 

Jf*=0 JC*_ 1 =0 JC*_2=0 JC1 =0 

The summation with respect to x\ gives *2 + 1 , that with respect to x-i gives 

f> 2 + D = te + 1 >f 3+2) , 

X2=0 Z ‘ 
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A (x 3 + D(*3 + 2) = (x 4 + l)(X4 + 2)(x 4 + 3) 
^2! 3! 

*3=0 

and so on; so that 

£ 

(21.6.3) WO-sflMH~S 

‘ r= 1 

for large N. 

On the other hand, if G(k) ^ k, all but a finite number of n are 
representable in the form (21.6.1), and 

A(N) > N — C, 

where C is independent of N. Hence 

N — C <: A(N ) ^ B(N) ~ ^ , 

which is plainly impossible when k > 1 . It follows that G(k) > k. 

Theorem 394 gives the best known universal lower bound for G(k). 
There are arguments based on congruences which give equivalent, or better, 
results for special forms of k. Thus 

x 3 = 0, 1, or — 1 (mod 9), 

and so at least 4 cubes are required to represent a number N = 9m ± 4. 
This proves that G(3) ^ 4, a special case of Theorem 394. 

Again 

(21.6.4) x 4 = 0 or 1 (mod 16), 

and so all numbers 16m+15 require at least 15 biquadrates. It follows that 
G( 4) ^ 15. This is a much better result than that given by Theorem 394, 
and we can improve it slightly. 

It follows from (2 1 .6.4) that, if 1 6 n is the sum of 1 5 or fewer biquadrates, 
each of these biquadrates must be a multiple of 16. Hence 

15 15 

l6n = x f = 

1=1 1=1 
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n = 

«•= i 

Hence, if 16 n is the sum of 15 or fewer biquadrates, so is n. But 31 is not 
the sum of 1 5 or fewer biquadrates; and so 1 6 m . 3 1 is not, for any m. Hence 

Theorem 395: 


G( 4) ^ 16. 


More generally 
Theorem 396: 

G( 2°) ^ 2 0+1 if 6^ 2. 

The case 6=2 has been dealt with already. If 6 > 2, then 

k = 2° >0 + 2. 


Hence, if x is even, 

x 2 ° = 0 (mod 2 0+2 ), 

while if x is odd then 

x 26 = (1 + 2m) 2 * 1 + 2* +1 m + 2 0+1 (2* - l)m 2 

= 1 - 2 e+l m(m - 1) = 1 (mod 2 e+2 ). 

Thus 

(21.6.5) x 2 * = 0 or 1 (mod 2 0+2 ). 

Now let n be any odd number and suppose that 2 0+2 n is the sum of 
2 0+2 — 1 or fewer fcth powers. Then each of these powers must be even, 
by (21.6.5), and so divisible by 2 k . Hence 2 k ~ 0 ~ 2 \ n, and so n is even; a 
contradiction which proves Theorem 396. 

It will be observed that the last stage in the proof fails for 6 = 2, when 
a special device is needed. 

There are three more theorems which, when they are applicable, give 
better results than Theorem 394. 
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Theorem 397. Ifp > 2 and 9^0, then G{p d (p — 1)} ^ p e+ *. 


For example, 


G(6) > 9. 

If k = p e (p — 1), then 9 + 1 ^ 3 e < k. Hence 

x* = 0 (mod p 6+l ) 

if p\x. On the other hand, if p\x, we have 

x k =xP 6{ p- {) = \ (mod p d+x ) 

by Theorem 72. Hence, if p e+x n, where p\n, is the sum of p e+x — 1 
or fewer fcth powers, each of these powers must be divisible by p e+x 
and so by p k . Hence p* | p d+l n, which is impossible; and therefore 
G(k)>p 0+l . 

Theorem 398. Ifp > 2 and9 ^ 0, then G{^p e (p — 1)} ^ \( P d+X — 0- 

For example, G(10) ^ 12. 

It is plain that 

k = 5 p d (p - 1) *tp e > 9 + 1, 
except in the trivial case p = 3, 9 = 0, k = 1. Hence 

x k = 0 (mod/? 0+1 ) 

ifp | x. On the other hand, if p\x, then 

x ik =x P e (p- D = 1 (mod/ +1 ) 

by Theorem 72. Hence p e+l \(x 2k — 1), i.e. 

p e+l \(x k - l)(x* + 1). 

Since p > 2, p cannot divide both x k — 1 and jc* + 1, and so one ofx* — 1 , 
and x* + 1 is divisible by p e+x . It follows that 

x* = 0, 1, or — 1 (modp 0+1 ) 
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for every x; and therefore that numbers of the form 

p 0+l m ± — 1) 

require at least \ (p 0+l — l) kth powers. 

Theorem 399. If 9 ^ if then G(3.2 e ) ^ 2 0+ 2 . 

This is a trivial corollary of Theorem 396, since G( 3.2°) ^ G(2 9 ) > 
2 0+1 . We may sum up the results of this section in the following 
theorem. 

Theorem 400. G{k) has the lower bounds 

(i) 2 9+2 ifkis 2 e or 3.2* and 9 ^ 2; 

(ii) p 0+l ifp > 2 and k =p e (p - 1); 

(iii) 5 (p 0+x - 1) ifp > 2 and k = \p 9 (p - 1); 

(iv) k + 1 in any case. 

These are the best known lower bounds for G(k). It is easily verified 
that none of them exceeds 4k, so that the lower bounds for G(k) are much 
smaller, for large k, than the lower bound for g(k) assigned by Theorem 
393. The value of g(k) is, as we remarked in § 20.1, inflated by the difficulty 
of representing certain comparatively small numbers. 

It is to be observed that k may be of several of the special forms mentioned 
in Theorem 400. Thus 


6 = 3(3 - 1) = 7 - 1 = ^(13 - 1), 

so that 6 is expressible in two ways in the form (ii) and in one in the form 
(iii). The lower bounds assigned by the theorem are 

3 2 = 9, 7 1 = 7, £(13-1) = 6 , 6+1=7; 

and the first gives the strongest result. 


t The theorem is true for 6 = 0 and 0 = 1 , but is then included in Theorems 394 and 397. 
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21.7. Sums affected with signs: the number v(k). It is also natural 
to consider the representation of an integer n as the sum of s members of 
the set 

(21.7.1) 0, 1*, 2*, ..., -1*, -2*, -3*,..., 
or in the form 

(21.7.2) n = ±x\±x\±---±x k s . 

We use v(Jk) to denote the least value of s for which every n is representable 
in this manner. 

The problem is in most ways more tractable than Waring’s problem, 
but the solution is in one way still more incomplete. The value of g(k) is 
known for many k, while that of v(k) has not been found for any k but 2. 
The main difficulty here lies in the determination of a lower bound for v(k) ; 
there is no theorem corresponding effectively to Theorem 393 or even to 
Theorem 394. 

Theorem 40 1 : v(k) exists for every k. 

It is obvious that, if g(k ) exists, then v(k) exists and does not exceed 
g(k). But the direct proof of the existence of v(k) is very much easier than 
that of the existence of g(k). 

We require a lemma. 

Theorem 402: 


(-I)*"'-' ( * 1 ) (* + r)* = + d, 

r = 0 ' ' 

where d is an integer independent of x. 

The reader familiar with the elements of the calculus of finite differ- 
ences will at once recognize this as a well-known property of the (Jk— l)th 
difference of x*. It is plain that, if 


Qk ( x ) = + • • • 
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is a polynomial of degree k y then 


A Q k (x) = Q k {x + 1) - Qk(x ) = kA k x* 1 4 , 

A 2 Qk(x) = k(k - 1 )A k x k ~ 2 H , 


A k ~ l Q k (x) = k\ A k x + d, 


where d is independent of x. The lemma is the case Qk(x) = x*. In fact 
d = j (A: — !)(£!), but we make no use of this. 

It follows at once from the lemma that any number of the form k\x + d 
is expressible as the sum of 



numbers of the set (21 .7. 1); and 

n-d = k\x + l, -\(k\) < l ^ \{k\) 
for any n and appropriate / and x. Thus 

n = (k\ x + d) + l, 


and n is the sum of 


2* - - + / ^ 2* -1 + \{kV) 

numbers of the set (21 .7. 1). 

We have thus proved more than Theorem 401 , viz. 

Theorem 403: 


v(k) ^2 k ~ l + £(*!). 
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21.8. Upper bounds for v(k). The upper bound in Theorem 403 is 
generally much too large. 

It is plain, as we observed in § 21 .7, that v(k) ^ g(k). We can also find 
an upper bound for v(k) if we have one for G(k). For any number from a 
certain N(k) onwards is the sum of G(k) positive fcth powers, and 


n+y k > N(k) 


for some y, so that 


G(k) 

1 

and 

(21.8.1) v(k)^G(k) + 1. 

For all but a few small k, this is a much better bound than g(k). 

The bound of Theorem 403 can also be improved substantially by more 
elementary methods. Here we consider only special values of k for which 
such elementary arguments give bounds better than (21.8.1). 

(1) Squares. Theorem 403 gives v(2) < 3, which also follows from the 
identities 


2x + 1 = (x + l) 2 - x 2 
and 

2x=x 2 - (x- 1) 2 + l 2 . 

On the other hand, 6 cannot be expressed by two squares, since it is not 
the sum of two, and x 2 — y 2 = (x —y)(x+ y) is either odd or a multiple 
of 4. 

Theorem 404: 


v(2) = 3. 


(2) Cubes. Since 

n 3 — n = (n — 1 )n(n + 1) = 0 (mod6) 



434 


REPRESENTATION BY CUBES AND 


[Chap. XXI 


for any n, we have 

n = n 3 - 6x = n 3 - (x + l) 3 - (x - l) 3 - 2x 3 

for any n and some integral x. Hence v(3) ^ 5. 

On the other hand. 


y 3 = 0, 1, or — 1 (mod 9); 

and so numbers 9m ±4 require at least 4 cubes. Hence v(3) ^ 4. 

Theorem 405: v(3) is 4 or 5. 

It is not known whether 4 or 5 is the correct value of v(3). The identity 
6x = (x + l) 3 + (x - l) 3 - 2x 3 

shows that every multiple of 6 is representable by 4 cubes. Richmond and 
Mordell have given many similar identities applying to other arithmetical 
progressions. Thus the identity 

a 

6x + 3 = x 3 — (x- 4) 3 + (2x - 5) 3 -(lx- 4) 3 

shows that any odd multiple of 3 is representable by 4 cubes. 

(3) Biquadrates. By Theorem 402, we have 

(21.8.2) (x + 3) 4 - 3(x + 2) 4 + 3(x + l) 4 - x 4 - 24x + d 

(where d = 36). The residues of 0 4 , l 4 , 3 4 , 2 4 (mod 24) are 0, 1, 9, 16 
respectively, and we can easily verify that every residue (mod 24) is the 
sum of 4 at most of 0, ±1, ±9, ±16. We express this by saying that 0, 1, 
9, 16 are fourth power residues (mod 24), and that any residue (mod 24) is 
representable by 4 of these fourth power residues. Now we can express any 
n in the form n = 24x + d + r, where 0 ^ r < 24; and (2 1 .8.2) then shows 
that any n is representable by 8 + 4 = 12 numbers ±y 4 . Hence v(4) < 12. 
On the other hand the only fourth power residues (mod 16) are 0 and 1, 
and so a number 16m+8 cannot be represented by 8 numbers ±y 4 unless 
they are all odd and of the same sign. Since there are numbers of this form, 
e.g. 24, which are not sums of 8 biquadrates, it follows that v(4) ^ 9. 

Theorem 406: 


9 < v(4) < 12. 
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(4) Fifth powers. In this case Theorem 402 does not lead to the best 
result; we use instead the identity 


(21.8.3) (x + 3) 5 -2(x + 2) 5 + x 5 + (x- l) 5 

- 2(x - 3) 5 + (x- 4) 5 = 720* - 360. 


A little calculation shows that every residue (mod 720) can be represented 
by two fifth power residues. Hence v(5) < 8 + 2 = 10. 

The only fifth power residues (mod 11) are 0, 1 , and — 1 , and so numbers 
of the form 1 lm±5 require at least 5 fifth powers. 

Theorem 407: 


5 ^ v(5) ^ 10. 

21.9. The problem of Prouhet and Tarry: the number P(k 9 j). There 
is another curious problem which has some connexion with that of § 21.8 
(though we do not develop this connexion here). 

Suppose that the a and b are integers and that 

Sh = Sh(a) = a\ + a% H f a s = 

and consider the system of k equations 

(21.9.1) S h (a)=S h (b) (1 ^h^k). 

It is plain that these equations are satisfied when the b are a permutation 
of the a ; such a solution we call a trivial solution. 

It is easy to prove that there are no other solutions when s ^ k. It is 
sufficient to consider the case s = k. Then 

b\ 4- £>2 + • • • + b k , b^ 4- • • • + h|, . . . , b\ + • • • + bfc 

have the same values as the same functions of the a, and therefore^ the 
elementary symmetric functions 

bj, bjbj, ..., bib2.. b k 


^ By Newton’s relations between the coefficients of an equation and the sums of the powers of 
its roots. 
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have the same values as the same functions of the a . Hence the a and the 
b are the roots of the same algebraic equation, and the b are a permutation 
of the a. 

When s > k there may be non-trivial solutions, and we denote by P(k, 2) 
the least value of s for which this is true. It is plain first (since there are no 
non-trivial solutions when s < k) that 

(21.9.2) P(k,2)>k+l. 

We may generalize our problem a little. Let us take j ^ 2, write 

Shu = a h \u + a 2u ■+ b a su 

and consider the set of k(j — 1) equations 

(21.9.3) S hx =S h 2 = . . . = S hj (1 ^ h k). 

A non-trivial solution of (2 1 .9.3) is one in which no two sets a, M ( 1 < i < s) 
and a iv (\ < i ^ j) with u ^ v are permutations of one another. We write 
P(k,j ) for the least value of s for which there is a non-trivial solution. 
Clearly a non-trivial solution of (21.9.3) for j ^ 2 includes a non-trivial 
solution of (2 1.9.1) for the same s. Hence, by (2 1 .9.2), 

Theorem 408: 


P(kJ)^P(k,2)>k+l. 

In the other direction, we prove that 
Theorem 409: 


P(k,j) ^ \k{k+ 1) + 1. 

Write s = jA:(/:+l)+l and suppose that n > s\stj. Consider all the sets 
of integers 

(2\.9A) a\,a 2 ,...,a s 

for which 


1 ^ a r ^ n (1 ^ r < s). 


There are rt s such sets. 
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Since Ufl f <«we have 

s ^ Sh(a) ^ sn h . 


Hence there are at most 

y[( sn h - S +l)<s k ni k < k+ "=s k rt s - 1 

h=l 


different sets 

(21.9.5) S l (a),S 2 (a),...,S k (a). 

Now 


j! j. s* n s 1 < n s , 

and so at least j! j of the sets (21.9.4) have the same set (21.9.5). But the 
number of permutations of s things, like or unlike, is at most $!, and so 
there are at least j sets (21.9.4), no two of which are permutations of one 
another and which have the same set (21.9.5). These provide a non-trivial 
solution of the equations (21.9.3) with 

s = \k(Jk + 1) + 1. 

21.10. Evaluation of P(kJ ) for particular k and j. We prove 

Theorem 410. P{kJ ) = k + 1 for k = 2, 3, and 5 and all j. 

By Theorem 408, we have only to prove that P(k,j ) < k + 1 and for 
this it is sufficient to construct actual solutions of (21.9.3) for any given j. 
By Theorem 337, for any fixed j, there is an n such that 

n = cj + d\ = cl + d\ = . . . = cj + dj, 

where all the numbers ci , c 2 , ...» Cj, d \ , ... , dj are positive and no two are 
equal. If we put 

a\ u = c u , a 2u — d u , a$ u = c u , a^ u = d u . 


it follows that 


S\u — S 2 u — 2/i, S$u — 0 (1 ^ u ^ j), 
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and so we have a non-trivial solution of (21.9.3) for k = 3, s = 4. 

Hence P(3,j ) ^ 4 and so P(3,j ) = 4. 

For k = 2 and k = 5, we use the properties of the quadratic field k(p) 
found in Chapters XIII and XV. By Theorem 255, n = 3+pandir = 3 +p 2 
are conjugate primes with nix = 1. They are not associates, since 

7T _ 7T 2 _9 + 6/0 + /0 2 _8 + 5 
it nix 7 7 7’ 

which is not an integer and so, a fortiori , not a unity. Now let u > 0 and 
let n 2u — A u — B up where A u , B u are rational integers. If 7\A U , we have 

nnlA u , n\A u , n\B u p 

ink(p), and7Vjr|i? 2 ,7|2? 2 ,7|2? M in fc(l). Finally 7|7T 2m , nn\7X 2u , 7x\n2 u ~ l , 
ii | n in k{p), which is false. Hence 7 \A U and, similarly, 7 \ B u . 

If we write c u = 7 J ~ U A U , d u - 7->~ u B u , we have 

c 2 u + c u d u + dl= N(c u - d u p) = 7 2 i~ 2u Nn 2u = l 2 f 

Hence, if we put a\ u — c u , 02 U = d u , aj u — —(c u + d u ), we have S\ u = 0 
and 


Siu — cl + d% + (c u + d u ) 2 — 2(c 2 + c u d u + dl) = 2 . 7 2j . 

Since at least two of (a\ u , a2 U , (*3u) are divisible by V~ u but not by 7 j/_M+1 , 
no set is a permutation of any other set and we have a non-trivial solution 
of (21.9.3) with k = 2 and s = 3. Thus P(2,j) = 3. 

Incidentally, we have also 

S 4u =4 + 4 + (c„ + rf„) 4 = 2 (c 2 u + Cudu + dy = 2.1*’ 
and so, for any j, we have a non-trivial solution of the equations 

(21.10.1) x 2 +y 2 +z 2 = x\ + y 2 + z\ = . . . = xj +y 2 + z 2 
and 

( 21 . 10 . 2 ) +y\ + A = A +yi + 4 — • • • = *j +yf + ■ • 

For k = 5, we write 

&lu = C u , Q2u — du, Clj u — C u d u , 04 u = ttiu. 


a 5u — a 2 uj a 6u — a 3u 
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and have S\ u = S^ u = Ss u = 0, S2u = 4 . l 2j , S 4 U = 4 . . 

As before, we have no trivial solutions and so P(5J) = 6. 

The fact that, in the last solution for example, S\ u = Si u = Ss u = 0 
does not make the solution so special as appears at first sight. For, if 

a ru = ^ru (1 1 ^ U ^j) 

is one solution of (2 1 .9.3), it can easily be verified that, for any d, 

Oru ~ Am + d 

is another such solution. Thus we can readily obtain solutions in which 
none of the S is zero. 

The case j — 2 can be handled successfully by methods of little use for 
larger j. If a \ , 02 , . . . , a s , b \ , . . . , b s , is a solution of (2 1 .9. 1 ), then 

(21.10.3) 

s s 

| (fli + d) h + b 1 - J = jaf 4- (bj + d) h J (1 ^ h ^ k + 1) 

»=i 1=1 

for every d. For we may reduce these to 

£(;)*w-£(i 

/=i v 7 /=i v 


^S h -i(b)d l (2 0 0+1) 


and these follow at once from (21.9.1). 

We choose d to be the number which occurs most frequently as a 
difference between two a or two b. We are then able to remove a good 
many terms which occur on both sides of the identity (21.10.3). 

We write 


[«!,- • • > <*s\k = [b \, . . ',b s ]k 

to denote that Sj,(a) = Sh(b ) for 1 ^ h ^ k. 

Then 


[0, 3] 1 =[1,2],. 
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Using (2 1 .10.3), with d = 3, we get 

[1,2,6b = [0, 4, 5] 2 . 

Starting from the last equation and taking d = 5 in (21.10.3), we obtain 

[0,4,7,11] 3 = [1,2,9,10b- 

From this we deduce in succession 

[1,2, 10, 14, 18] 4 = [0,4,8, 16, 17] 4 ( d = 7), 

[0, 4, 9, 17, 22, 26] 5 = [1,2, 12, 14, 24, 25] s (d = 8), 
[1,2,12,13,24,30,35,39] 6 = [0,4,9, 15, 26, 27,37, 38] 6 (d = 13), 
[0,4,9,23,27,41,46,50b = [1,2, 11,20,30,39,48,49]? (d = U). 

The example* 

[0, 18,27,58,64,89, 101] 6 = [1, 13,38,44,75,84, 1021 6 , 

shows that P(k, 2) ^ k + 1 for k = 6; and these results, with Theorem 408, 
give 

Theorem 411. Ifk ^ 7, P(k, 2) = k + 1. 

21.11. Further problems of Diophantine analysis. We ^hd this 
chapter by a few unsystematic remarks about a number of Diophantine 
equations which are suggested by Fermat’s problem of Ch. XllY. 

(1) A conjecture of Euler. Can a kth power be the sum of s positive Ath 
•powers? Is 

( 21 . 11 . 1 ) +4 + •••+*{ =/ 

soluble in positive integers? ‘Fermat’s last theorem’ asserts the impossi- 
bility of the equation when s — 2 and k > 2, and Euler extended the 
conjecture to the values 3,4, ... ,k — 1 of s. For k = 5, s = 4, however, 
the conjecture is false, since 

27 5 + 84 5 -I- 110 5 + 133 5 = 144 5 . 

t This may be proved by starting with 

[ 1 , 8, 1 2, 1 5, 20, 23, 27, 34] i = [0,7, 11, 17, 18, 24, 28, 35] i 


and taking d = 7, 11, 13, 17, 19 in succession. 
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The equation 

( 21 .H. 2 ) xj+jc^h 

has also attracted much attention. The case k = 2 is familiar. t When k = 3, 
we can derive solutions from the analysis of § 13.7. If we put k — 1 and 
a = —3 b in (13.7.8), and then write —\q for b, we obtain 

(21.11.3) x=l-9q 3 , y = -1, u = -9q 4 , v = 9q 4 -3q; 
and so, by (13.7.2), 

(9<? 4 ) 3 + (3 q - 9q 4 ) 3 + (1 - 9q 3 ) 3 = 1. 

If we now replace q by %/q and multiply by 77 12 , we obtain the identity 

(21.11.4) (9£ 4 ) 3 + (3 t-q 3 - 9£ 4 ) 3 + (r/ 4 - 9£V = (q 4 ) 3 . 

All the cubes are positive if 


0 < £ < 9 577 , 


so that any twelfth power 77 12 can be expressed as a sum of three positive 
cubes in at least 


1 

577 


When k > 3, little is 


ways. 

known. A few particular solutions of (2 1 . 1 1 .2) are 


known for k = 4, the smallest of which is 

(21.11.5) 30 4 + 120 4 + 272 4 + 315 4 = 353 4 .* 


t See § 13.2. 
t The identity 

(4jc 4 -/) 4 + 2 (4x 3 y) 4 + 2(2xy 3 ) 4 = (4x 4 +/) 4 

gives an infinity of biquadrates expressible as sums of 5 biquadrates (with two equal pairs); and the 
identity 

(x 2 - y 2 ) 4 + (2xy + y 2 ) 4 + (2xy + x 2 ) 4 = 2(x 3 + xy - y 2 ) 4 
gives an infinity of solutions of 


x\ +x\ x 4 =y\ +y\ 


(all with y\ =yi). 
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For k = 5, there are an infinity included in the identity 

(21.11.6) (75y 5 -x 5 ) 5 + (. x 5 + 25y 5 ) 5 + (x 5 - 25 y 5 ) s 

+ (10xV) 5 + (50xy 4 ) 5 = (. x 5 + 75y 5 ) 5 . 

All the powers are positive if 0 < 25y 5 < x 5 < 75y 5 . No solution is known 
with k ^ 6. 

(2) Equal sums of two kth powers. Is 

(21.11.7) x\+y\ =4 +f 2 
soluble in positive integers? More generally, is 

(21.11.8) *1 +A = A. +^2 = • * • = A +>'r 

soluble for given k and r? 

The answers are affirmative when k = 2, since, by Theorem 337, we 
can choose n so as to make r(n ) as large as we please. We shall now prove 
that they are also affirmative when k = 3. 

Theorem 412. Whatever r, there are numbers which are representable 
as sums of two positive cubes in at least r different ways. 

We use two identities, viz. 


(21.11.9) 

X 3 -Y ] =x]+y\ 

if 


(21.11.10) 

+ „ _ yi(2x] +yj) 

x i-y 3 i ’ ~ x i~y 3 i 

and 


(21.11.11) 

x l+yl=X 3 -Y 3 

if 


(21.11.12) 

X(X 3 - 2Y 3 ) 2 __ Y(2X 3 - T 3 ) 

X2 ~ X 3 + Y 3 ’ y ~ X 3 + Y 3 
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Each identity is an obvious corollary of the other, and either may be deduced 
from the formulae of § 1 3 .7 ^ From (21.11.9) and (21.11.11) it follows that 

(21.11.13) x] + y] = x\ +y\. 

Here xi,yi are rational if x\ ,y\ are rational. 

Suppose now that r is given, that x\ andyi are rational and positive and 
that 

v-'yi 

is large. Then X , Y are positive, and X/Y is nearly x\/2y \ ; and X 2 , yi are 
positive and xi/yi is nearly X/2Y ox x\ /Ay \ . 

Starting now with X 2 , y 2 in place of x\ ,y \ , and repeating the argument, 
we obtain a third pair of rationals * 3,^3 such that 

x\ +y\ = x\ +y\ = x] +y\ 

and x^/ys is nearly x\ /4 2 y\ . After r applications of the argument we obtain 

(21.11.14) x]+y\=xl+yl = ...=x}+y 3 r , 
all the numbers involved being positive rationals, and 

4 X JL 42 x ± 

, "T , “T 

y 1 yi yi yr 

all being nearly equal, so that the ratios x s /y s (s = 1 , 2 , . . . , r) are certainly 
unequal. If we multiply (21.11.14) by / 3 , where / is the least common 
multiple of the denominators of x\,y \, . . . ,x r ,y r , we obtain an integral 
solution of the system (21.11.14). 

Solutions of 

A +y\=A +yj 

* If we put a = b and A = 1 in (13.7.8), we obtain 

* = 8 a 3 + 1, y = 16a 3 — 1, u = 4a — 16a*, v = 2a + 16a*; 

and if we replace u by jq, and use (13.7.2), we obtain 

(q 4 - 2 q) 3 + (2q 3 - l) 3 = (q 4 4- q) 3 - (q 3 + l) 3 , 
an identity equivalent to (21.11.11). 
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can be deduced from the formulae (13.7.1 1); but no solution of 

xi+yi= x 2+y} = A +y* 

is known. And no solution of (2 1 . 1 1 .7) is known for k ^ 5. 

We showed how to construct a solution of (21.10.2) for any j. 
Swinnerton-Dyer has found a parametric solution of 

(21.11.15) x\+x s 2 + xf = y\ +y 5 2 +y\ 

which yields solutions in positive integers. A numerical solution is 

(21.11.16) 49 5 + 75 s + 107 5 = 39 5 + 92 5 + 100 5 . 

The smallest result of this kind for sixth powers is 

(21.11.17) 3 6 + 19 6 +22 6 = 10 6 + 15 6 + 23 6 . 


NOTES 

A great deal of work has been done on Waring’s problem during the last hundred years, 
and it may be worth while to give a short summary of the results. We have already referred 
to Waring’s original statement, to Hilbert’s proof of the existence of g(k), and to the proof 
that g(3) = 9 (Wieferich, Math. Annalen, 66 (1909), 99-101, corrected by Kempner, ibid. 
72 (1912), 387-97 and simplified by Scholz, Jber. Deutsch. Math. Ver. 58 (1955), Abt. 1, 
45-48). 

Landau [ibid. 66 (1909), 102-5] proved that G( 3) ^ 8 and it was not until 1942 that 
Linnik [Comptes Rendus ( Doklady ) Acad. Sci. USSR, 35 (1942), 162] announced a proof 
that G(3) ^ 7. Dickson [Bull. Amer. Math. Soc. 45 (1939) 588—91] showed that 8 cubes 
suffice for all but 23 and 239. See G. L. Watson, Math. Gazette, 37 (1953), 209-1 1, for a 
simple proof that G(3) < 8 and Joum. London Math. Soc. 26 (1951), 153-6 for one that 
G(3) < 7 and for further references. After Theorem 394, G( 3) ^ 4, so that G(3) is 4, 5, 
6, or 7; it is still uncertain which, though the evidence of tables points very strongly to 4 
or 5. See Western, ibid. 1 (1926), 244-50. Deshouillers, Hennecart, and Landreau (Math. 
Comp. 69 (2000), 421-39) have offered evidence to the effect that 7 373 170 279 850 is 
the largest integer that cannot be represented as the sum of four positive integral cubes. 

Hardy and Littlewood, in a series of papers under the general title ‘Some problems of 
partitio numerorum’, published between 1920 and 1928, developed a new analytic method 
for the study of Waring’s problem. They found upper bounds for G(k) for any k, the first 
being 


( k - 2)2* 1 + 5, 

and the second a more complicated function of k which is asymptotic to kl k ~ 2 for large k. 
In particular they proved that 

(a) G( 4) < 19, G(5) < 41, G(6) < 87, G(7) < 193, G(8) ^ 425. 
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Their method did not lead to any new result for G(3); but they proved that ‘almost all’ 
numbers are sums of 5 cubes. 

Davenport, Acta Math. 71 (1939), 123-43, has proved that almost all are sums of 4. 
Since numbers 9m±4 require at least 4 cubes, this is the final result. 

Hardy and Littlewood also found an asymptotic formula for the number of representa- 
tions for n by s Arth powers, by means of the so-called 'singular series’. Thus r^2 \ («), the 
number of representations of n by 21 biquadrates, is approximately 



{l + 1.331cos(i„* + li>r) 


-I- 0-379 cos 



(the later terms of the series being smaller). There is a detailed account of all this work 
(except on its ‘numerical’ side) in Landau, Vorlesungen, i. 235-339. 

As regards g(Jfc), the best results known, up to 1933, for small k, were 


g( 4) <37, g( 5) <58, g( 6)^ 478, g( 7)^ 3806, g( 8)^31353 

(due to Wieferich, Baer, Baer, Wieferich, and Kempner respectively). All these had been 
found by elementary methods similar to those used in §§ 2 1 . 1-4. The results of Hardy and 
Littlewood made it theoretically possible to find an upper bound for g(k) for any k , though 
the calculations required for comparatively large k would have been impracticable. James, 
however, in a paper published in Trans. Amer. Math. Soc. 36 (1934), 395-444, succeeded 
in proving that 


0 b ) g( 6) ^ 183, g(7) < 322, g(8) ^ 595. 

He also found bounds for g(9) and g( 1 0). 

The later work of Vinogradov made it possible to obtain much more satisfactory results. 
Vinogradov’s earlier researches on Waring’s problem had been published in 1924, and there 
is an account of his method in Landau, Vorlesungen , i. 340-58. The method then used by 
Vinogradov resembled that of Hardy and Littlewood in principle, but led more rapidly to 
some of their results and in particular to a comparatively simple proof of Hilbert’s theorem. 
It could also be used to find an upper bound forg(£) . In his later work Vinogradov made very 
important improvements, based primarily on a new and powerful method for the estimation 
of certain trigonometrical sums, and obtained results which were, for large k , far better than 
any known before. Thus he proved that 


G(k) ^ 6k log k -I- (4 + log 2\6)k; 

so that G(k) is at most of order fclog k. Vinogradov’s proof was afterwards simplified 
considerably by Heilbronn, who proved that 


(c) 


G(k) ^ 6k log k -I- 


{4 + 3 log 



* + 3. 


The resulting upper bound for G(k) is better than that of(a) for k > 6 (and naturally far better 
for large values of k). Vinogradov (1947) improved his result to G(k) < k ( 3 log/t + 11), 
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Tong (1957) and Chen (1958) replaced the number 11 in this by 9 and 5.2 respectively, 
while Vinogradov (Izv. Akad. Nauk SSSR Ser. Mat. 23 (1959), 637-42) proved that 

(d) G(k) ^ k( 2 log k + 4 log log k + 2 log log log k -I- 13) 

for all k in excess of 1 70,000. 

More has been proved since concerning smaller k : in particular, the value of G( 4) is now 
known. Davenport [Annals of Math. 40 (1939), 731-47] proved that G(4) ^ 16, so that, 
after Theorem 395, (7(4) = 16; and that any number not congruent to 14 or 15 (mod 16) is 
a sum of 14 biquadrates. He also proved [Amer. Journal of Math. 64 (1942), 199-207] that 
(7(5) ^ 23 and (7(6) ^ 36. It has been proved by Davenport’s method that G(7) ^ 53(Rao, 
J. Indian Math. Soc. 5 (1941), 117-21 and Vaughan, Proc. London Math. Soc. 28 (1974), 
387). Narasimkamurti (J. Indian Math. Soc. 5 (1941), 1 1-12) proved that G(8) ^ 73 and 
found upper bounds for k = 9 and 10, subsequently improved by Cook and Vaughan (Acta 
Arith. 33 (1977), 231-53). The last-named proved that 

G( 9) ^ 91, G(10) ^ 107, G(ll) ^ 122, G(12) ^ 137. 

Vaughan’s method leads to G(k) ^ k( 3 log A: + 4.2) (k ^ 9), which is better than (d) for 
k < 2.131 x 10 10 (approx.) and otherwise worse. 

Vinogradov’s work also led to very remarkable results concerning g(Jfc). If we know 
that G(k) does not exceed some upper bound G(k), so that numbers greater than C(k) are 
representable by G(k) or fewer Ath powers, then the way is open to the determination of 
an upper bound for g(k). For we have only to study the representation of numbers up to 
C(k), and this is logically, for a given k , a question of computation. It was thus that James 
determined the bounds set out in (A); but the results of such work, before Vinogradov’s, were 
inevitably unsatisfactory, since the bounds (a) for G(k) found by Hardy and Littlewood are 
(except for quite small values of k) much too large, and in particular larger than the lower 
bounds forg(£) given by Theorem 393. 

If 


«»■ -’■*[©■]- 

is the lower bound for g(k) assigned by Theorem 393, and if, for the moment, 
we take G(k) to be the upper bound_ for G(k) assigned by (</), then g(Jk) is 
of much higher order of magnitude than G(k). In fact g(k) > G(k) for k ^ 7. Thus if 

k > 7, if all numbers from C(k) on are representable by G(k) powers, and all numbers 
below C{k) by g(k) powers, then 


g(k) — g(k). 


And it is not necessary to determine the C(k) corresponding to this particular G(k); it is 
sufficient to know the C{k) corresponding to any G(k) ^ g(k), and in particular to G(k) = 
g(k). 

This type of argument led to an ‘almost complete’ solution of the original form of 
Waring ’s problem. The first, and deepest, part of the solution rests on an adaptation of 
Vinogradov’s method. The second depends on an ingenious use of a ‘method of ascent’, a 
simple case of which appears in the proof, in § 21.3, of Theorem 390. 
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Let us write 

-< = [<f)*]. B =1* — 2* A, C-[(|)*]. 


The final result is that 

(e) g(k) — 2^+A — 2 

for all k ^ 2 for which 

(/) B < 2* - A - 2. 

In this case the value of g(k) is fixed by the number 

n = 2 k A — 1 = (A - 1)2* + (2* - l).l* 

used in the proof of Theorem 393, a comparatively small number representable only by 
powers of 1 and 2. The condition (/) is satisfied for 4 < k ^ 471 600000 (Kubina and 
Wunderlich, Math. Comp. 55 (1990), 815—20) and may well be true for all k > 3. It can 
only be false for at most a finite number of k (Mahler, Mathematika 4 (1957), 122-4). 

It is known that B ^ 2 k —A — l and that B ^ 2*— A (except for k = 1). If 5 ^ 2* —A+ 1, 
the formula for g(k) is different. In this case, 

g(k) = 2* +A+D-3 if 2* < AD + A+D 


and 


g(k) = 2 k +A + D-2 if 2* = AD + A + D. 

It is readily shown that 2* < AD + A + D. 

Most of these results were found independently by Dickson [Amer. Journal of Math. 58 
(1936), 521-9, 530-5] and Pillai [Journal Indian Math. Soc. (2) 2 (1936), 16-44, and Proc. 
Indian Acad. Sci. (A), 4 (1936), 261]. They were completed by Pillai [ibid. 12 (1940), 
30-40] who proved that g( 6) = 73; by Rubugunday [Journal Indian Math. Soc. (2) 6 
(1942), 192-8] who proved that B ^ 2* — A\ by Niven [Amer. Journal of Math. 66 (1944), 
137-43] who proved (e) when B = 2* —A — 2, a case previously unsolved; by Jing-run Chen 
0 Chinese Math. Acta 6(1 965), 1 05-27) who proved that g(5) = 37, and by Balasubramanian, 
Deshouillers, and Dress, who have shown that g( 4) = 19 (C. R. Acad . Sci. Paris. Ser. I 
Math. 303 (1986), 85-88 and 161-3). 

It will be observed that there is much more uncertainty about the value of G(k) than 
about that of g(k)\ the most striking case is k = 3. This is natural, since the value of G(k) 
depends on the deeper properties of the whole sequence of integers, and that ofg(&) on the 
more trivial properties of special numbers near the beginning. 

Vaughan, The Hardy— Littlewood Method , gives an excellent account of the topic and a 
full bibliography. 

Much progress has been accomplished on topics associated with Waring’s problem over 
the past three decades. A fairly comprehensive survey may be found in the paper of Vaughan 
and Wooley in Surveys in Number Theory, Papers from the Millenial Conference in Number 
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Theory, (A. K. Peters, Ltd., MA, 2003). In brief, there have been two phases of activity. In the 
first phase, pursued more or less independently by Thanigasalam and Vaughan throughout 
the early 1980’s, the methods originally developed by Davenport (as cited earlier) were 
refined to perfection. The papers of Vaughan (Proc. London Math . Soc. (3) 52 (1986), 
45-63 and J. London Math. Soc. (2) 33 (1986), 227-36) represent the culmination of this 
activity, in which it is shown that G(5) <21, G(6) <31, G(7) < 45, G(8) < 62 and 
G(9) < 82. Vaughan also proved that 'almost air positive integers are sums of 32 eighth 
powers, a conclusion that is best possible. 

The landscape was then transformed at the end of the 1980’s with the introduction by 
Vaughan of smooth numbers (that is, integers all of whose prime divisors are ‘small’) 
into the Hardy-Littlewood method (see Acta Math . 162 (1989), 1-71). This led inter 
alia to the bounds G(5) < 19, G(6) < 29, G(7) < 41, G(8) < 57, G(9) <75,..., 
G(20) < 248. Subsequently, a new iterative element (‘repeated efficient differencing’) 
was found by Wooley (Ann. of Math. (2) 135 (1992), 131-64) that delivered the sharper 
bounds G(6) < 27, G(7) < 36, G(8) < 47, G(9) < 55, . . . , G(20) < 146, and for larger 
exponents it, the upper bound G(k) < &(log k + log log k + G(l)). The latter provided the 
first sizeable progress on Vinogradov’s estimate (d\ from 1959. Wooley also showed that 
‘almost all’ positive integers are the sum of 64 16th powers, and also the sum of 128 32nd 
powers, each of which are best possible conclusions. The sharpest bounds currently (2007) 
available from this circle of ideas are 

G(5) < 17, G(6) < 24, G( 7) < 33, G(8) < 42, G(9) < 50, . . . , G(20) < 142 

(see work of Vaughan and Wooley spanning the 1990’s summarised in Acta Arith. 
(2000), 203-285), and 

G(k) < *(log k + log log * + 2 + G(log log k/ log k)) 

(see Wooley, J. London Math. Soc. (2) 51 (1995), 1-13). 

Further progress has been made on the topic of sums of fourth powers beyond the con- 
clusions of Davenport (1939) summarised above. Thus, Vaughan (Acta Math . 162 (1989), 
1-71) has shown that whenever n is a large enough integer congruent to some number r 
modulo 16, with 1 < r < 12, then n is the sum of 12 fourth powers. Kawada and Wooley 
(J. ReineAngew. Math. 512 (1999), 173-223) obtained a similar conclusion for sums of 11 
fourth powers whenever n is congruent to some integer r modulo 16 with 1 < r <10. 

§ 21.1. Liouville proved, in 1859, that g(4) < 53. This upper bound was improved 
gradually until Wieferich (1909) proved that g(4) < 37 by elementary methods. Dickson 
(1933) improved this to 35 by die methods described above and Dress (Comptes Rendus 
272A (1971), 457-9) reduced it further to 30 by an adaptation of Hilbert’s method of proof 
that g(k ) exists. We have already referred to the proof by Balasubramanian, Deshouillers, 
and Dress thatg(4) = 19. 

Complementing work of Davenport (Ann. of Math. (2) 40 (1939), 731-47) showing 
that G(4) = 16, Deshouillers, Hennecart, Kawada, Landreau, and Wooley (J. Theor. 
Nombres Bordeaux 12(2000), 41 1-22 and Mem. Soc. Fr. (N.S.) No. 100(2005), vi+ 120pp.) 
have recently established that the largest integer that is not the sum of 16 fourth powers .is 
13792. Amongst other devices, the proof makes use of the identity x 4 + y 4 + (x +y) 4 = 
2(x 2 -}- xy -hy 2 ) 2 , which also appears in the display preceding equation (21.10.1) above. 

References to the older literature relevant to this and the next few sections will be found 
in Bachmann, Niedere Zahlentheorie, ii. 328-48, or Dickson, History, ii, ch. xxv. 

§§ 21.2-3. See the note on § 20.1 and the historical note above. 
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§21.4. The proof for g(6) is due to Fleck. Maillet proved the existence ofg(8) by a more 
complicated identity than (2 1 .4.2); the latter is due to Hurwitz. Schur found a similar proof 
forg(10). 

§ 21.5. The special numbers n considered here were observed by Euler (and probably 
by Waring). 

§ 21.6. Theorem 394 is due to Maillet and Hurwitz, and Theorems 395 and 396 to 
Kempner. The other lower bounds for G(k) were investigated systematically by Hardy and 
Littlewood, Proc. London Math. Soc. (2) 28 (1928), 518-42. 

§§ 21.7-8. For the results of these sections see Wright, Journal London Math. Soc. 9 
( 1 934), 267-72, where further references are given; Mordell, ibid. 11(1 936), 208-1 8; and 
Richmond, ibid. 12 (1937), 206. 

Hunter, Journal London Math. Soc. 16: (1941), 177-9 proved that 9 $ v(4) < 10; we 
have incorporated in the text his simple proof that v(4) ^ 9. For inequalities satisfied by 
v(Jfc) for 6 < k ^ 20, see Fuchs and Wright, Quart. J. Math. (Oxford), 10 (1939), 190-209 
and Wright, J. fur Math. 311/312 (1979), 170-3. 

Vaserstein has shown that v(8) 28 (J. Number Theory 28 (1988), 66-68), and 

A. Choudhry has proved that v(7) ^ 12 (J. Number Theory 81 (2000), 266-9). Both 
conclusions depend on the existence of remarkable polynomial identities too lengthy to 
record here. 

§§ 21.9-10. Prouhet [Comptes Rendus Paris, 33 (1851), 225) found the first non-trivial 
result in this problem. He gave a rule to separate the first y* +1 positive integers into j sets 
of/* members, which provide a solution of (21.9.3) with s = j *. For a simple proof of 
Prouhet’s rule, see Wright, Proc. Edinburgh Math, Soc. (2) 8 ( 1 949), 1 38-42. See Dickson, 
History, ii, ch. xxiv, and Gloden and PalamA, Bibliographic des Multigrades (Luxembourg, 
1948), for general references. Theorem 408 is due to Bastien [Sphinx-Oedipe 8 (1913), 
171-2] and Theorem 409 to Wright [Bull. American Math. Soc. 54 (1948), 755-7). 

§ 21.10. Theorem 410 is due to Gloden [Mehrgradige Gleichungen, Groningen, 1944, 
7 1 -90). For Theorem 411, see Tarry, L ’intermediaire des mathematiciens, 20 ( 1 9 1 3), 68-70, 
and Escott, Quarterly Journal of Math. 41 (1910), 152. 

A. L6tac found the examples 


[1,25,31,84,87, 134, 158, 182, 198]g = [2, 18,42,66, 113, 116, 169, 175, 199]g 


and 


[± 1 2, ±1 1 88 1 , ±2023 1 , ±20885, ±23738] 9 

= [±436, ± 1 1 857, ±20449, ±20667, ±23750] 9 , 


which show that P(k, 2) = k + 1 for k = 8 and k — 9. See A. Letac, Gazeta Matematica 
48 (1942), 68-69, and A. Gloden, loc. cit. 

P. Borwein, Lison&k and Percival (Math. Comp. 72 (2003), 2063-70) found the example 


[±99, ± 1 00, ± 1 88, ±30 1 , ±3 1 3] 9 = [±7 1 , ± 1 3 1 , ± 1 80, ±307, ±308] 9 , 


which provides a smaller solution than that available earlier, again confirming that P(k, 2) = 
k + 1 for k = 9. As the result of what is probably best described as independently joint 
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work of Shuwen Chen, Kuosa, and Meyrignac (see http://euler.free.fr/eslp/eslp.htm for 
more details), in 1999 an example equivalent to 

[±22, ±6 1 , ±86, ± 1 27, ± 1 40, db 1 5 1 ] 1 1 = [±35,±47,±94,±121,±146,±148]n 

was discovered that confirms that P(k, 2) = k + 1 for k = 1 1 . 

§21.11. The most important result in this section is Theorem 412. The relations (21.11 .9)- 
(21.11.12) are due to Vieta; they were used by Fermat to find solutions of (21.11.14) for 
any r (see Dickson, History, ii. 550-1 ). Fermat assumed without proof that ail the pairs x s , 

y s , ( s = 1,2 r) would be different. The first complete proof was found by Mordell, 

but not published. 

Of the other identities and equations which we quote, (21.11.4) is due to Gfrardin 
[L’intermediaire des math. 19 (1912), 7] and the corollary to Mahler [Journal London 
Math. Soc. 11 (1936), 136-8], (21.11.6) to Sastry [ibid. 9 (1934), 242-6], the paramet- 
ric solution of (21.11.15) to Swinnerton-Dyer [Proc. Cambridge Phil. Soc. 48 (1952), 
516-8], (21.11.16) to Moessner [Proc. Ind. Math. Soc. A 10 (1939), 296-306], (21.11.17) 
to Subba Rao [Journal London Math. Soc. 9 (1934), 172-3], and (21.11.5) to Norrie. 
Patterson found a further solution and Leech 6 further solutions of (21.11.2) for A: =4 
[Bull. Amer. Math. Soc. 48 (1942), 736 and Proc. Cambridge Phil. Soc. 54 (1958), 554- 
5], The identities quoted in the footnote to p. 441 were found by Fauquembergue and 
G6rardin respectively. For detailed references to the work of Norrie and the last two authors 
and to much similar work, see Dickson, History, ii. 650-4. Lander and Parkin [Math. 
Computation 21 (1967), 101-3] found the result which disproves Euler’s conjecture for 
k = 5, s = 4. Elides [Math. Comp. 51 (1988), 825-35) has found solutions of (21.11.1) 
which disprove it for k = 4, s = 3. The smallest counter example, computed by Frye, is 
95800 4 + 217519 4 + 414560 4 = 422481 4 . Brudno (Math. Comp. 30(1976), 646-8) gives 
a two-parameter solution of the equation x^+x%+x% =y^+y$+y%, of which (21.11.17) 
is a particular solution. 

For a survey of the subject of equal sums of like powers see Lander, American Math. 
Monthly 75 (1968), 1061-73. 
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22.1. The functions ft(x) and fix). In this chapter we return to the 
problems concerning the distribution of primes of which we gave a pre- 
liminary account in the first two chapters. There we proved nothing except 
Euclid’s Theorem 4 and the slight extensions contained in §§ 2.1-6. Here 
we develop the theory much further and, in particular, prove Theorem 6 
(the Prime Number Theorem). We begin, however, by proving the much 
simpler Theorem 7. 

Our proof of Theorems 6 and 7 depends upon the properties of a function 
fix) and (to a lesser extent) of a function ft (x). We write^ 

(22.1.1) ft (x) = £ log p = log Y\ P 

P<:X p^X 


and 

( 22 . 1 . 2 ) fix) = ^2 l °ZP = ^2 A («) 

p" 1 ^* n^x 

(in the notation of § 17.7). Thus 


fi\0) = 3 log 2 -|- 2 log 3 4- log 5 + log 7, 

there being a contribution log 2 from 2, 4, and 8, and a contribution log 3 
from 3 and 9. If//” is the highest power of p not exceeding x, log p occurs m 
times in fix). Also jf 1 is the highest power of p which divides any number 
up to x, so that 


(22.1.3) fix) = log f/(x), 


where U (x) is the least common multiple of all numbers up to x. We can 
also express fix) in the form 


(22.1.4) 



t Throughout this chapter jc (and 7 and t) are not necessarily integral. On the other hand, m, n, h, k, 
etc., are positive integers and p, as usual, is a prime. We suppose always that x ^ 1. 
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The definitions of t?(x) and rfr{x) are more complicated than that of n(x), but they 
are in reality more ‘natural’ functions. Thus f{x) is, after (22.1.2), the ‘sum function’ of 
A(«), and A (n) has (as we saw in § 17.7) a simple generating function. The generating 
functions of d(x), and still more of n(x), are much more complicated. And even the 
arithmetical definition of ifr(x), when written in the form (22.1.3), is very elementary and 
natural. 

Since p 2 ^ x,jf ^ x, . . . are equivalent top ^ xi,p we have 

(22.1.5) f{x) = (x) + (xz) + (xi) + • • • = 


The series breaks off when x l/m < 2, i.e. when 

logx 


m > 


log 2' 


It is obvious from the definition that #(x) < x logx for x ^ 2 .A fortiori 
& (x x/m ) < x 1/m logx ^ x7 logx 


if m ^ 2; and 


^2 (x 1/m ) = O Jx2 (logx) 2 } , 

m^2 


since there are only 0(log x) terms in the series. Hence 
Theorem 413: 

^(x) = (x) + o{x2 (logx) 2 }. 

We are interested in the order of magnitude of the functions. Since 

jt(x) = 53 1, 0 (*) = 53 

p^x p^x 


it is natural to expect (x) to be 'about log x times’ n(x). We shall see later that this is so. 
We prove next that (x) is of order x, so that Theorem 413 tells us that (x) is 'about the 
same as’ #(x) when x is large. 
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22.2. Proof that #(x) and ifr(x) are of order x. We now prove 
Theorem 414. The Junctions $ (jc) and \jr (x) are of order x: 

(22.2.1) Ax < $(x) < Ax, Ax < \fr(x) < Ax (x ^ 2). 

It is enough, after Theorem 4 1 3, to prove that 

(22.2.2) fi(x) < Ax 
and 

(22.2.3) r/r(x) > Ax (x ^ 2) . 

In fact, we prove a result a little more precise than (22.2.2), viz. 

Theorem 415: 

t?(/i) < 2 n log 2 for all n ^ 1. 

By Theorem 73, 

(2m +1)! (2m + 1) (2m) . . . (m + 2) 

m! (m + 1)! m! 

is an integer. It occurs twice in the binomial expansion of (1 + l) 2w+1 and 
so 2 M < 2 2m+1 and M < 2 2m . 

If m + 1 <. p <~2m+\, p divides the numerator but not the denominator 
of M. Hence 

( n p)\ m 

v m+l<p$2m+l / 1 

and 

#(2m + 1) — i )(m + 1) = log p ^ logAf < 2m log2. 

m+l</?^2m+l 

Theorem 415 is trivial for n = 1 and for n = 2. Let us suppose it true 
for all w < no — 1 . If no is even, we have 

(no) = # («o — 1) < 2 (no — 1) log 2 < 2no log 2. 
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If no is odd, say no = 2m + 1 , we have 

tf(«o) = (2 m + 1) = (2 m + 1) - + 1) + i + 1) 

< 2m log 2 + 2 (m + 1) log 2 
= 2 (2m + 1) log 2 = 2no log 2, 

since m + 1 < no. Hence Theorem 415 is true for n = no and so, by 
induction, for all n. The inequality (22.2.2) follows at once. 

We now prove (22.2.3). The numbers 1,2, ... ,n include just [n/p] 
multiples of p, just [n/pi 2 ] multiples of p 2 , and so on. Hence 

Theorem 416: 


where 


We write 


n \ = rv ( -\ 

p 


j ( n,p ) 



N = 


(2 n)! 
(«!) 2 


nA 

p^2n 


so that, by Theorem 416, 

«"> *-£([?]-’[?])• 

Each term in round brackets is 1 or 0, according as [2n/p m ] is odd or even. 
In particular, the term is 0 if p" 1 > 2 n. Hence 


(22.2.5) 


[ log 2 n 

P ^ L log p _ 


log N = ^ k P lo 8P ^ [ 1 f g2 ~ l ] lo g/> = ^(2 n) 

/>s$2n L 10g ^ - 1 


and 
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( 22 . 2 . 6 ) 
and so 


(2/j)! _n+l/i + 2 2 n 


i/r(2n) ^ n log 2. 

For* ^ 2, we put n = [|x] ^ 1 and have 

ir(x) ^ ir(2n) ^ n log2 ^ jxlog2, 


which is (22.2.3). 

22.3. Bertrand’s postulate and a formula’ for primes. From Theorem 414, we can 
deduce 

Theorem 417 . There is a number B such that, for every x > 1 , there is a prime p 
satisfying 


x < p ^ Bx. 


For, by Theorem 414, 


C\x < (x) < C 2 x (. x ^ 2) 

for some fixed C \ , C 2 . Hence 

tf(C 2 */Ci) > C\ ( C 2 X/C \ ) = C 2 X > #(*) 

and so there is a prime between x and C 2 x/C\. Ifweputi? = max(C2/Ci,2),Theorem417 
is immediate. 

We can, however, refine our argument a little to prove a more precise result. 

Theorem 418 (Bertrand's Postulate ). Ifn ^ 1, there is at least one prime p such that 

(22.3.1) n<p^2n; 
that is, if p r is the r-th prime, 

(22.3.2) /V+l < 2 p r 
for every r. 

The two parts of the theorem are clearly equivalent. Let us suppose that, for some 
n > 2 9 = 512, there is no prime satisfying (22.3.1). With the notation of § 22.2, let p be a 
prime factor of N , so that k p ^ 1 . By our hypothesis,/? ^ n. If < p ^ n, we have 

2p ^2n < 3 p, p 2 > 5/1 2 > 2n 
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and (22.2.4) becomes 



= 2-2 = 0 . 


Hence p ^ for every prime factor p of N and so 

(22.3.3) T>gp < log/? = #(§«)< log 2 

P\N p^n 


by Theorem 415. 

Next, if k p > 2, we have, by (22.2.5) 

2 log/? < k p log p ^ log (2 ri) , p ^ V(2«) 
and so there are at most ^/(2 n) such values of p. Hence 

^2 k p log p s* y/(2n) log (2n) , 


and so 

(22.3.4) log N ^ ^2 log/? + ^2 k P logP ^ ^2 log ^ + V( 2n ) l °g (2») 

* p =l Ap5i2 /?|W 

< log 2 + V( 2w ) l°g ( 2 «) 

by (22.3.3) 

On the other hand, N is the largest term in the expansion of 2r n = (1 4- 1) , so that 
2 2 * 1 = 2 + ( 2 ;) + (]") + ■■ • + ,) < 2nN. 


Hence, by (22.3.4), 

2n log 2 < log (2 n) + log N < |n log 2 + { 1 4- ■ s /(2n) \ log (2n) , 
which reduces to 


(22.3.5) 

We now write 


2n log 2 < 3 { 1 + y/(2n ) } log (2n) . 
log(n/512) 


? = 


> 0 , 


10 log 2 

so that 2n = 2 10 ( ,+ ^. Since n > 512, we have f > 0. (22.3.5) becomes 

2 10(l+o ^ 30 ^2 5+5 f + l) (1 + f ) , 
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whence 

2 5 ^ <30 . 2 -5 (l + 2~ 5 ~ 5 S ) (1+?) < (l -2~ 5 ) (l+2“ 5 ) (1+0 < 1+f . 

But 


2 5f = exp(5f log 2) > 1 + 5? log 2 > 1 + ? , 

a contradiction. Hence, if n > 512, there must be a prime satisfying (22.3. 1). 
Each of the primes 


2, 3, 5, 7, 13, 23, 43, 83, 163, 3 17, 63 1 

is less than twice its predecessor in the list. Hence one of them, at least, satisfies (22.3.1) 
for any n ^ 630. This completes the proof of Theorem 418. 

We prove next 

Theorem 419. If 


we have 
(22.3.6) 

By (2.2.2), 


oo 

a Pm lO -2 " = 02030005000000070 . . . , 

m= 1 



2 m A 2 m ~ { 

Pm <2* = 4 

and so the series for a is convergent. Again 

o < io 2 " £; Pm io- ! '< f; 4 2 ”' 1 io- 2 ”' 1 

m=n+\ 

= t (l) 2 ”- , <(l)^ 7 rTr<A<' 

m=n+l — 3/ 

Hence 

m= 1 


and, similarly, 
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It follows that 


[lO 2 ".] -tO 1 -' [l0*-'«] =10 2 " ( X>10-- =P, 

\m= 1 m= 1 / 

Although (22.3.6) gives a ‘formula’ for the /ith prime />„, it is not a very useful one. To 
calculate p n from this formula, it is necessary to know the value of a correct to 2 n decimal 
places; and to do this, it is necessary to know the values of p\ , p2 , « . . ,Pn- 

There are a number of similar formulae which suffer from the same defect. Thus, let us 
suppose that r is an integer greater than one. We have then 

Pn^r n 


by (22.3.2). Indeed, for r ^ 4, this follows from Theorem 20. Hence we may write 


oo 

a r = P™ r 

m= 1 


— m 2 


and we can deduce that 


p„ = [r^a r ]-r 2 "- , [r<"- , > 2 a r ] 
by arguments similar to those used above. 

Any one of these formulae (or any similar one) would attain a different status if the exact 
value of the number a or a r which occurs in it could be expressed independently of the 
primes. There seems no likelihood of this, but it cannot be ruled out as entirely impossible. 

For another formula for p n , see § 1 of the Appendix. 

22.4. Proof of Theorems 7 and 9. It is easy to deduce Theorem 7 from 
Theorem 414. In the first place 

#(*) = T. lo 8 P < log* 1 = log* 

P< X p^x 


and so 
(22.4.1) 


7T(*) ^ 


»(*) 

log* 


> 


Ax 

log* 


On the other hand, if 0 < 3 < 1, 

#(*) ^ J2 logp ^ (1 -3) log* 1 

X l ~ S <p^JC x l ~ s <p^x 

= (1-3) log*{7r(*)-7r(* 1-a )) (1—3) log or {tt (jc)— jc 1— a } 



22.4 ( 420 )] 

and so 


THE SERIES OF PRIMES 


459 


(22.4.2) 


... 1-8 , *(*) 
tt(jc) < x + 


Ax 


(1-5) log* log* 


We can now prove 
Theorem 420: 


jt(x) 


#(*) jr(x) 

/S/ — 

log* log*' 


After Theorems 413 and 414 we need only consider the first assertion. 
It follows from (22.4.1) and (22.4.2) that 

l < irWlogx < x^logs | 1 


i?(jc) " &(x) ' 1 -5 

For any € > 0, we can choose 5 = 5(f) so that 

rb ■= 1 + 


and then choose xo = *o (5, f ) = xo(f ) so that 


x 1 5 log* A log* . 

< 5 < 

#(x) X s 2 


for all x > xq. Hence 


1 ^ 


Jt(x) logx 
»(x) 


< 1 +€ 


for all x > jco. Since € is arbitrary, the first part of Theorem 420 follows at 
once. 

Theorem 9 is (as stated in § 1.8) a corollary of Theorem 7. For, in the 
first place, 


n = 7t(p„) < 


Apn 
log Pn ’ 


p„ > An log p n > An log/*. 


n = 7T(p„) > 


Ap n 

log Pn’ 


Secondly, 
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so that 


-JPn < 


Ap n 
log p n 


< An, 


p n < An 1 , 


and 


p n < AnXogpn < An\ogn. 

22.5. Two formal transformations. We introduce here two elementary 
formal transformations which will be useful throughout this chapter. 

Theorem 42 1 . Suppose that c\,ci,...is a sequence of numbers, that 

C(0 = X>«, 

and that fit) is any function of t. Then 

(22.5.1) £>„/(*)= C(n){f(n)-f(n+l)} + Cix)f([x]). 

n^x n^x—\ 

If in addition, Cj = 0 for j < n\< and fit) has a continuous derivative for 
t ^ n\, then 

X 

(22.5.2) X >„/(/0 = Cix)fix) - J Cit)fit)dt. 


If we write N = [jc], the sum on the left of (22.5. 1) is 

C(l)/(1) + (C(2) - C(l)}/(2) + • • • + { CiN ) - CiN - 1 )}fiN) 

= C(l){/(1) — /(2)} + . . - + CiN - 1 ){fiN - 1) -fiN)} 

+ CiN)fiN). 

Since CiN) = Cix), this proves (22.5.1). To deduce (22.5.2) we observe 
that C(t) = Cin) when n ^ t < n + 1 and so 

«+ 1 

Cin) [fin) -fin+ 1)} = - J Cit)f\t)dt. 

n 

Also C(/) = 0 when t < n\. 

t In our applications, n\ = 1 or 2. If «i = 1 , there is, of course, no restriction on the c„. If n\ = 2, 
we have c\ = 0. 
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If we put c„ = 1 and fit) = \/t , we have C(x) = [x] and (22.5.2) 
becomes 


where 


£-=- + f l 4 d> 

;fe” * J ' 


= logx + y +E, 


OO 

Y = \-J -p-d, 


is independent of x and 


Thus we have 
Theorem 422: 


J]i = logx + y+o(i), 

n£x V / 


where y is a constant 0 known as Euler ’s constant). 

22.6. An important sum. We prove first the lemma 
Theorem 423: 

J]log /, 0=O(x) (h> 0). 

/f^JC 

Since log t increases with t, we have, for n ^ 2, 

log* Q < J log* (f)dt. 

/l-l 
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Hence 

XV ©« 

n=2 


since the infinite integral is convergent. Theorem 423 follows at once. 

If we put h — 1 , we have 

E log n = [ x ] log* + O (x) = x log x + 0(x). 

n^x 

But, by Theorem 416, 

E log" = E' (M >P) log P = E log P = E ["] A(,I) 

n^x p^x p™^x J n<jc 

in the notation of § 17.7. If we remove the square brackets in the last sum, 
we introduce an error less than 

E = fix) = 0(x) 

n^x 

and so 

E " A (' 1 ) = E = xlog* + 0(x). 

~r n “t 

ft^jc n^x 

If we remove a factor x, we have 
Theorem 424: 

A(n) , 

E = log* 4- 0(1). 

n^x H 


J ‘og* (l) d,=x f 


00 


■/*£*-- • 


— ^ = log* + 0(1) . 

P 


From this we can deduce 
Theorem 425: 
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n^.x 


A(n) log p ^ log P 

p» 


y\ 




m^2p m ^x 


?(?*?*■ ) w '-?^ 


P 

oo 


1) 


y. JOg*_ 

n=2 

If, in (22.5.2), we put / (t) = \/t and c„ = A(/i), so that C(x) = ^r(x), 
we have 


A(«) *(*) 


nCx 


n 


+ 


X 

f 


no 


dt 


and so, by Theorems 414 and 424, we have 


( 22 . 6 . 1 ) 


/ = log* + O (1). 


From (22.6.1) we can deduce 

(22.6.2) iimflK*)/*} ^ 1, limfV'Oc)/*} ^ 1. 

For, if Urn {^(x) /*} = 1 + 5, where 5 > 0, we have ^(x) > (l + jS)x 
for all x greater than some xq. Hence 

i™ d ‘>!™ d ‘+j {± ^ dt > o + « *. - * 

2 2 x 0 

in contradiction to (22.6.1). If we suppose that lim{^(x)/x} = 1 — 5, we 
get a similar contradiction. 

By Theorem 420, we can deduce from (22.6.2) 


Theorem 426: 


If 3 T(x)/ tends to a limit as x -+ oo, the limit is 1. 


> 1 . 
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Theorem 6 would follow at once if we could prove that 7r(x)/ tends 
to a limit. Unfortunately this is the real difficulty in the proof of Theorem 6, 

22.7. The sum T,p~ l and the product 11(1 Since 

(22.7.1) «< to, -£ + £ + ••• 

1 


1 1 

< 2p + 2p + ' ' ‘ ” 2 p(p - 1) 


and 


Y L 

* ^ n( n — 


' p(p - 1) 

is convergent, the series 

eMt^-KI 

must be convergent. By Theorem 19, is divergent and so the product 


(22.7.2) 


na- P -‘) 


must diverge also (to zero). 

From the divergence of the product (22.7.2) we can deduce that 

7t(x) = o(x), 

i.e. almost all numbers are composite, without using any of the results of §§ 22.1-6. Of 
course, this result is weaker than Theorem 7, but the very simple proof is of some interest. 
We choose r so that 


M =P\P2- -Pr <py...p r pr+\ 


and k the positive integer such that kM < x < (k + 1 )M. Let H be the number of 
positive integers which (i) do not exceed (k + \)M and (ii) are not divisible by any of 
the primes p\, . . . ,p r , i.e. are prime to M. These numbers clearly include all the primes 
Pr+l,- -,Pn(x)- Hence 


7t(x) <r + //. 


By definition is the number of integers prime to M and less than or equal to M, so 
that H = (k + 1 Butx^kM and so, by (16.1.3), 



(k + 1 WM) ^ 





kM 


M 
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r 

Pr-lPr 




1 

Pr - 1 


0. 


As x —*■ oo, so does r and we have 


n(x) 

X . 


r H 

— I 

x x 


-> 0 


that is, 7t(x) = o(;c). 

We can prove the divergence of 11(1 — p~ l ) independently of that of 
as follows. It is plain that 


n(r^r)=nK^ + 

p^N V y / p^N x y y 



the last sum being extended over all n composed of prime factors p ^ N. 
Since all n < N satisfy this condition, 



- > log# — A 
n 


by Theorem 422. Hence the product (22.7.2) is divergent. 

If we use the results of the last two sections, we can obtain much more 
exact information about £p -1 . In Theorem 421, let us put c p = log p/p, 
and c n = 0 if n is not a prime, so that 


CM = £ 

p^x 


log P 

P 


log* + t(x), 


where r(x) = 0(1) by Theorem 425. With/(f) = 1/ log t, (22.5.2) becomes 


(22.7.3) 


_1 _ C(x) + 
~ io «* 


= l + 


T(*) 

logx 


J t\og 2 t 

J t log t J t log 2 1 


2 2 
= loglog x + B\ + E(x), 
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where 

OO 

Sl = I-loglog 2 + /ig 


and 
(22.7.4) 

E(x) 


i 

rW_ J ziOdt _ / 1 \ <«._ ) =0 (J_\ 

logo: 7 /log 2 / \logx/ l y /log 2 // \logx/ 


Hence we have 
Theorem 427: 


r- = loglogx + + o(l). 


where B\ is a constant. 

22.8. Mertens’s theorem. It is interesting to push our study of the series 
and product of the last section a little further. 

Theorem 428. In Theorem 427, 

( 22 . 8 . 1 ) 


a. = y + £{iog(i-i) + i}, 




where y is Euler 's constant. 

Theorem 429 (Mertens’s theorem): 

P<X v 

As we saw in § 22.7, the series in (22.8.1) converges. Since 


log*' 


p^x 
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Theorem 429 follows from Theorems 427 and 428. Hence it is enough to 
prove Theorem 428. We shall assume that^ 


( 22 . 8 . 2 ) 


Y = 


-r'(i) 


oo 

j e~ x \o%xdx. 
o 


If 8 ^ 0, we have 

0 < ^°g p l+s < 2 p 1+s (p l+s — 1) "" 2 p(p — 1) 

by calculations similar to those of (22.7. 1). Hence the series 

F(S) = ^{i°g( ,_ p I + J ) + P m } 

is uniformly convergent for all 5 ^ 0 and so 

F(8 ) -> F(0) 


as 8 ->• 0 through positive values. 

We now suppose 8 > 0. By Theorem 280, 

F(5)=g(5)-log^(l4-5), 

where 


gw = 

P 

If, in Theorem 42 1 , we put c p = l/p and c„ = 0 when n is not prime, we 
have 

C(x ) = 52 - = loglogx 4- B\ 4- E(x) 
by (22.7.3). Hence, if/(f) = t~ s , (22.5.2) becomes 

X 

J2p~ 1 ~ S = x~ S C(x) + 8 f r l ~ s C(t ) dt. 
p^x i 


t See, for example, Whittaker and Watson, Modem analysis, ch. xii. 
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Letting x -*■ oo, we have 


>C(t)dt 


OO 

= a/r' 


oo OO 

= 8 J t~ l ~ s (loglog t + B\)dt + 8 j r l ~ s E(t)dt. 


Now, if we put t = e u / & , 


OO OO 

= 8 J t~ l ~ s loglog t dt = J e~ u log du — —y — 


by (22.8.2), and 


OO 


Hence 


oo z 

+ log 8 — B\ + y = 8 Jr l ~ s E(t)dt-8 J r l ~ s (loglog t + B x )dt. 


Now, by (22.7.4), if T — exp(l /y/8). 


OO T OO 

. f m., .. [dt AS f dt 

S J 7^ dt <AS J 7 + Iogr/ 7* 


< A8 log T + < A^8 -► 0, 


as 8 -► 0. Also 


Z 




(loglog/ + fli)<* < / / (| loglog /| + \B\\)dt = A, 
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since the integral converges at t= 1 . Hence 

g(8) -f log 5 -► B\ - y 


as 5 — ► 0. 

But, by Theorem 282, 

log £ ( 1 + 5) + log 5 -> 0 


as <5 — * 0 and so 


F(8 ) -*■ B\ — y. 


Hence 


B x =y+F{ 0), 


which is (22.8.1). 

22.9. Proof of Theorems 323 and 328. We are now able to prove 
Theorems 323 and 328. If we write 


f\(n) = 


<p(n)e y loglog n 

n 


f2(n ) = 


v(n) 

neY loglog n ’ 


we have to show that 


lim/i(n) = 1, limf 2 (n) = 1. 

It will be enough to find two functions F\(t), F 2 (t), each tending to 1 as 
t oo and such that 


(22.9.1) 


fl(n) ^ F\ (log n), f 2 (n) ^ 


1 


F\ (log n) 


for all n ^ 3 and 


(22.9.2) 


M n j) > F 2 (J), fi(nj) < 


F 2 (/) 


for an infinite increasing sequence n 2 , « 3 , « 4 , 
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By Theorem 329,/i (n)f 2 (n) < 1 and so the second inequality in (22.9. 1) 
follows from the first; similarly for (22.9.2). 

Let pi, p2 , . . . ,Pr-p be the primes which divide n and which do not 
exceed log n and let p r -p+ \, . . . , p r be those which divide n and are 
greater than log n. We have 

(log n) p < p r -p+ 1 ...pr^n, p < - 

loglog n 


and so 


-o-sb) no-i)- 


Hence the first part of (22.9.1) is true with 

j\ //log/ 


fi(») = e* tog/ (l - |) 


But, by Theorem 429, as t -*■ oo, 

//log/ 


To prove the first part of (22.9.2), we write 


"j = n p 1 o 

p^eJ - 


so that 


log rij =j$( e J ) < Aje* 

by Theorem 414. Hence 


loglog ny ^ Aq +y + logy. 
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n (i > n<' 

p^eJ 


1 

CO’ + 0 


by Theorem 280. Hence 


Mnj) = 




g (”/) = g Y _ j-r / 1 -P J _ [ \ 

njeY log log n; loglog nj * \ \ 1 -p~ l ) 

W+DWo +J + log» n (l-p-i) = Fl{j) 


(say). This is Jhe first part of (22.9.2). Again, as y -► oo, £(/ + 1) 
and, by Theorem 429, 


Fi(j) 


l 

CO' + i)(^i +y + logy) 


1. 


1 


22.10. The number of prime factors of n. We define co (n) as the num- 
ber of different prime factors of n, and £2 in) as its total number of prime 
factors; thus 


a)(n) = r, C2(n) = a\ + ai -\ 1 - a r . 


when n = p \ x . . . p a / . 

Both coin) and Q ( n ) behave irregularly for large n. Thus both functions 
are 1 when n is prime, while 


when n is a power of 2. If 


«(/i) = 


logn 
log 2 


n =P\P2---Pr 

is the product of the first r primes, then 

coin) = r = 1 T(p r ), log/I = l ){pr) 

and so, by Theorems 420 and 414, 

, . tiipr) logn 

coin) ~ ~ - — - 

log Pr loglog n 

(when n -*■ oo through this particular sequence of values). 
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Theorem 430. The average order of both (o{n) and Q(n) is loglog/i. 
More precisely 

(22.10.1) =xloglogx + 5ix + o(x), 

n^x 

(22. 10.2) ^2 ^ (”) = x l°gl°g* + b 2 x + °W» 

n^x 


where B\ is the number in Theorems All and 428 and 


i?2 — B\ A- 


Y L_ 

y*p-iY 


We write 


Si = =IZZ! 1 = Z! \l1 * 

p|« p^x L ^ J 

since there are just \x/p\ values of n ^ x which are multiples of p. Removing 
the square brackets, we have 

(22.10.3) Si = Yj ~ + O {jtC-x)} = x loglogx + B\x + o(x) 

by Theorems 7 and 427. 

Similarly 

(22.10.4) S 2 = £>*)=£ £> = £ [^1, 

n^x n^x p m \n p m ^x ^ J 

SO that 


s 2 - Si = 5^'tx/p" ], 


where Y denotes summation over all p m < x for which m ^ 2. If we 
remove the square brackets in the last sum the error introduced is less than 


£'><£' 


log P 
log 2 


1r(x) ~ »(*) 

log 2 


= o(x) 
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S2-S1 =x^ 2 'p' m +o(x). 

The series 

m=2 p P 

is convergent and so 


Y^P~ m = B 1 -B i +0(1) 


as x -* 00 . Hence 


s 2 — S\= (B 2 -B i)jc + o(x) 

and (22.10.2) follows from (22.10.3). 

22.11. The normal order of coin) and Sl(n). The functions co(n) and 
£2(/i) are irregular, but have a definite ‘average order’ loglog n. There is 
another interesting sense in which they may be said to have ‘on the whole’ 
a definite order. We shall say, roughly, that f in) has the normal order Fin) 
if / (n) is approximately F{n) for almost all values of n. More precisely, 
suppose that 

(22.11.1) (1 - e)F(n) <f(n) < (1 + e)F{n) 

for every positive e and almost all values of n. Then we say that the normal 
order of fin) is Fiji). Here ‘almost all’ is used in the sense of §§ 1.6 and 
9.9. There may be an exceptional ‘infinitesimal’ set of n for which (22.1 1.1) 
is false, and this exceptional set will naturally depend upon e. 

A function may possess an average order, but no normal order, or 
conversely. Thus the function 

fin) = 0 in even), fin) = 2 in odd) 

has the average order 1, but no normal order. The function 

fin) = 2 m in = 2 m ), fin) = 1 in ^ 2 m ) 


has the normal order 1 , but no average order. 
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Theorem 431. The normal order of coin) and Q(n) is loglog n. More 
precisely , the number of n, not exceeding x, for which 

(22.11.2) | fin) - loglog n | > (loglog n) 2 +s , 
where f (n) is coin) or Q (n), is o(x) for every positive S. 

It is sufficient to prove that the number of n for which 

(22.11.3) \f(n) - loglog x | > (loglogx)2+* 

is o(x); the distinction between loglog n and loglog x has no importance. 
For 


loglog x - 1 ^ loglog n ^ loglog x 

when x x ! e ^ n ^ x, so that loglog n is practically loglog x for all such 
values of n ; and the number of other values of n in question is 


0{x x/e ) = o(x). 


Next, we need only consider the case f(n) = coin). For £2(/i) ^ co(n) 
and, by (22.10.1) and (22.10.2), 

{«(") - “(")) = 0(x). 

n^jc 

Hence the number of n ^ x for which 

£2(/i) — coin) > (loglogx)2 
is 



so that one case of Theorem 431 follows from the other. 

Let us consider the number of pairs of different prime factors p, q of 
n (i.e. p / q), counting the pair q,p distinct from p, q. There are coin) 
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possible values of p and, with each of these, just (o{n) — 1 possible values 
of q. Hence 


0 )(n) {coin) ~ 1 } = 1 = 1 - 1 . 

pi} I" pq\n rp-\n 

PM 

Summing over all n < x, we have 


First 


y^{<o(n)} 2 - y^<u(/i) 

n^x n^x 


e(e>-e>) 

n^x p2|„ / 





= Oix ), 


since the series is convergent. Next 




+ 0(x). 


Hence, using (22.10.1), we have 


(22.11.4) 


Now 


53 M ")) 2 

n^x 



4- 0(x loglogx). 


(22.11.5) 



since, if pq < x then p < x and q < x, while, if p ^ yjx and q ^ y/x, then 
pq < x. The outside terms in (22. 1 1 .5) are each 

{log log* + 0(1)} 2 = (log log*) 2 + 0(log log*) 
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and therefore 

(22.11.6) = *tfoglogx) 2 + 0(x loglogx). 

n^x 

It follows that 

(22.11.7) 

Y {(o(n) - loglogx} 2 

n^x 

= y - 2 lo s i°g^ Y + Mc^giog. x ) 2 

«< x n^JC 

= x(loglogx) 2 + 0(x loglogx) 

- 2 loglogx [x loglogx + 0(x)} + {* + 0(1)} (loglogx) 2 
= jt(loglogjc) 2 — 2x(loglogJc) 2 + x(loglog.x) 2 + 0(x\og log*) 

= 0(x loglog jc), 

by (22.10.1) and (22.1 1.6). 

If there are more than r)x numbers, not exceeding x, which satisfy 
(22. 1 1 .3) with / (n) = (o(n), then 

Y {*>00 - loglogx} 2 ^ /?x(loglogx) 1+25 , 

n^x 

which contradicts (22. 11.7) for sufficiently large x; and this is true for every 
positive t). Hence the number of n which satisfy (22. 1 1 .3) is o(x); and this 
proves the theorem. 

22.12. A note on round numbers. A number is usually called ‘round’ 
if it is the product of a considerable number of comparatively small factors. 
Thus 1200 = 2 4 . 3 . 5 2 would certainly be called round. The roundness of 
a number like 2187 = 3 7 is obscured by the decimal notation. 

It is a matter of common observation that round numbers are very rare ; 
the fact may be verified by any one who will make a habit of factoriz- 
ing numbers which, like numbers of taxi-cabs or railway carriages, are 
presented to his attention in a random manner. Theorem 431 contains the 
mathematical explanation of this phenomenon. 



THE SERIES OF PRIMES 


477 


22.12] 


Either of the functions co(n) or £2 (/i) gives a natural measure of the 
‘roundness’ of n, and each of them is usually about loglog n , a function of 
n which increases very slowly. Thus loglog 10 7 is a little less than 3, and 
loglog 10 80 is a little larger than 5. A number near 10 7 (the limit of the 
factor tables) will usually have about 3 prime factors; and a number near 
10 80 (the number, approximately, of protons in the universe) about 5 or 6. 
A number like 


6092087 = 37.229.719 
is in a sense a ‘typical’ number. 

These facts seem at first very surprising, but the real paradox lies a little 
deeper. What is really surprising is that most numbers should have so many 
factors and not that they should have so few. Theorem 43 1 contains two 
assertions, that o>(n) is usually not much larger than loglog n and that it is 
usually not much smaller; and it is the second assertion which lies deeper 
and is more difficult to prove. That co(n) is usually not much larger than 
loglog n can be deduced from Theorem 430 without the aid of (22.1 1.6).* 

22.13. The normal order of d{n). If n = . . -p a /, then 

(o(n) = r, £2(n) = ai + U2 H 1- a r , 

</(«) = (1 + ai)(l + 02 ). - .(1 + a r ). 


Also 


2 < 1 + a ^ 2 a 


and 


2 M(n) < d(n) < 2° (n) . 

Hence, after Theorem 431, the normal order of log d(n) is 

) 

log 2 log log n. 


T Roughly, if x (*) were of higher order than loglog x, and a>(n) were larger than x (n) for a fixed 
proportion of numbers less than jc, then 

n^x 


would be larger than a fixed multiple of xx to, in contradiction to Theorem 430. 
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Theorem 432. If € is positive, then 

(22.13.1) 2 (1_€)log,og " < d(n) < 2 (1 + €)loglog ' , 

for almost all numbers n. 

Thus d(n) is ‘usually’ about 

2 iogiog« = (i og /i) log2 = (log ny 69 -. 

We cannot quite say that ‘the normal order of d(n) is 2 log,og ”’ since the 
inequalities (22. 1 3 . 1 ) are of a less precise type than (22. 11.1); but one may 
say, more roughly, that ‘the normal order of din) is about 2 loglog ”\ 

It should be observed that this normal order is notably less than the 
average order log n. The average 

-(4/(1) -h 4/(2) H h </(«)} 

n 

is dominated, not by the ‘normal’ n for which d(n) has its most common 
magnitude, but by the small minority of n for which d(n) is very much 
larger than log nJ The irregularities of a)(n) and £l(n) are not sufficiently 
violent to produce a similar effect. 

22.14. Selberg’s theorem. We devote the next three sections to the 
proof of Theorem 6. Of the earlier results of this chapter we use only 
Theorems 420-4 and the fact that 

(22.14.1) VK*) = 0(x), 

which is part of Theorem 414. We prove first 
Theorem 430 (Selberg’s theorem): 

(22.14.2) i/r(x) log* + ^2 A(«)^ = 2xlogx + 0(x ) 

n^x 

and 

(22.14.3) ^A(/i)log«+ ^ A(m)A(n) = 2xlogx + 0(x). 

n^x mn^x 


t Sec the remarks at the ends of §§ 18.1 and 1 8.2. 
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It is easy to see that (22.14.2) and (22.14.3) are equivalent. For 
X^ A( n )^(^) = 5Z A (") 5Z A (m) = 5Z A(m)A(n) 

n^x n^x m^jc/n mn^x 

and, if we put c n = A(/i) and / (/) = log t in (22.5.2), 

(22.14.4) 

Y. A(») log n = \j/(x) logi - f dt = ^(x) log* + 0( x) 

n^x i 1 


by (22.14.1). 

In our proof of (22.14.3) we use the Mobius function (x(n) defined in 
§ 16.3. We recall Theorems 263, 296, and 298 by which 

(22.14.5) = 1 («= 1), £m(< 0=0 (n>l), 

d\n d\n 

(22.14.6) A (n) = — Y, /j,(d ) log </, logn = ^ A (d). 

d\n d\n 


Hence 

(22.14.7) £>(/,) A Q = - £ A « E M<<0 lo 8 rf 

A|n A|« </|g 

= - X] log^X:AW = -E ^ W log d log 

d\n d\n 

= A(n) log n + Y. /x(d) log 2 d. 

d\n 


Again, by (22.14.5), 


X] H(d) log 2 0) = log 2 AT, 
d |i 
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but, for n > 1, 

^2 tog 2 (?) = M(^)(tog 2 */ - 2 log* log d ) 

d\n d\n 

= 2 Ain) log* — A in) logw + Aik) Aifc) 

hk=n 

by (22.14.6) and (22.14.7). Hence, if we write 

S(x) = ^ZY1 W) log 2 Q) , 

n^x d\n 

we have 

S(x ) = log 2 * + 2^r(*) log* — ^Z Ain) log n + ^ A(h)A(k) 

n^x hk^x 

= ^A(«) log/i + ^Z A(m)A(n) + 0(x) 

n^jc mn^x 

by (22.14.4). To complete the proof of (22.14.3), we have only to show 
that 

(22.14.8) S(*) = 2* log* + 0(x). 

By (22.14.5), 

s( X ) - y 2 = J21Z^ d) { lo g 2 G) * y 2 } 

n^x d\n 

-5>«hk©-' ! i- 

d^x 

since the number ofn < *, for which d\n, is [*/</]. If we remove the square 
brackets, the error introduced is less than 

EK© + y’|-ow 

d^x 

by Theorem 423. Hence 

S(x) =x^Z (tog 2 (^) - Y 2 } + Oix). 

d<x a 


(22.14.9) 
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(22.14.10) E^KG)- 2 ) 


d^x 




d^x 

The sum of the various error terms is at most 

(22.14.11) 


k^x/d 


£ 3 ( log (?) + G ° © = ° G) £ log G) + 0(0 

d<x d^x 


= 0(1) 


by Theorem 423. Also 

(22.14.12) 

d^x k^x\d 

-£^MjM-£;£^{ to «G)-’'} 

dk^x n^x d\n 

= log x — y+ 52 — — = 2 log* + 0(1) 

2^n^x 

by (22.14.5), (22.14.6), and Theorem 424. (22.14.8) follows when we 
combine (22. 14.9H22. 14. 12). 

22.15. The functions R(x) and V (£). After Theorem 420 the Prime 
Number Theorem (Theorem 6) is equivalent to 

Theorem 434: 


and it is this last theorem that we shall prove. If we put 


\fr(x) = x + R(x) 
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in (22.14.2) and use Theorem 424, we have 

(22.15.1) R(x ) log* + A(n)R = 0(x). 

n^x 

Our object is to prove that R(x) = o(x)J 

If we replace n by m and x by x/n in (22. 1 5. 1 ), we have 

<>© + E *<*»£) -o©. 

m^jc/n 

Hence 

logx|/?(x) log* + A (n)R 

' n^x 

-^A(n){fi01og0+ £ A(m)R (~) J 

J = 0(x log*), 

that is 


R(x) log 2 x = - A (n)R log n 

n^x 



+ Y] A (m) A («)/?(—) + 0(x log*), 
\mn / 

mn^x 

whence 

(22.15.2) 

|/?(x)| log 2 x ^ |/? + 0(x log*), 

n^x 

where 

a„ = A(n) logn + A(h)A(k) 


hk—n 


t Of course, this would be a trivial deduction if R(x) ^ 0 for all x (or if R(x) ^ 0 for all x). Indeed, 
more would follow, viz. R(x) = 0( x/ log*). But it is possible, so far as we know at this stage of our 
argument, that /?(*) is usually of order x , but that its positive and negative values are so distributed 
that the sum over n on the left-hand side of (22.15.1) is of opposite sign to the first term and largely 
offsets it. 


= 0(xlogx) + O 


A(n) 
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and 

22 °n = 2xlogx + 0(x) 

n^x 

by (22.14.3). 

We now replace the sum on the right-hand side of (22. 1 5 .2) by an integral. 
To do so, we shall prove that 

X 

(22.15.3) 2Z an |*Q|= 2 f\R(^)\yogtdt + 0(x\ogx). 

i 

We remark that, if / > t' ^ 0, 

ll*(OI - l*(f')ll < \RiO - Rtf) I = l*(0 - irtf) -t + t'\ 

^ no - irtf) + t-t' = F{t ) - Ftf), 


where 

F{t) = n 0 + t = O(t) 

and Fit) is a steadily increasing function of t. Also 

(22.15.4) X>{f O - ' (^) | = © - M* fe) 




Oix log*). 

We prove (22.15.3) in two stages. First, if we put 

n 

C\ = o, C n =a n - 2 j log tdt, /M = |^(~)| 


n— 1 


in (22.5.1), we have 


W 


c (x) = 22 a n~ 2 f log tdt = 0(x) 

n^x "j 
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and 


(22.15.5) 

n 

e^KDI- 2 e 1*01 / ^ ,dt 

n^x 2^n^jc n _\ 

= E C( ”> {1*01 _ I* Grh )|} + C(X)R (r) 

n^x— 1 

- °( E * \ F © - F fe) )) + ow = 0(xl08Jr) 


by (22.15.4). 
Next 


*01 / l ° g idt - j |/?(i)|l°g«/< 

n— 1 n — 1 

n— 1 

< / | F (?) ~~ *" (n)l ,0g<< * ^ (n ~ l) | F (^l)-' F 0)- 


Hence 

(22.15.6) 

E 1*01/ , og'<*-/K7)|'o g '‘* 

2<n<ac „_i 1 



+ 0(x log*) = 0(jtlogjc). 


Combining (22.15.5) and (22.15.6) we have (22.15.3). 
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Using (22.15.3) in (22.15.2) we have 

X 

(22.15.7) |/?(jc) | log 2 * ^ 2 J |/? log tdt + 0(xlogx). 

l 

We can make the significance of this inequality a little clearer if we 
introduce a new function, viz. 

(22.15.8) V($) = e~tR(e*) = e~^(e^) - 1 

= A(«)} - 1. 

If we write x = e* and t = xe~ n , we have 

^ f 

||*(y)|log«*=AT J -,)<*,=* J \vm I d(dv 

1 0 or; 

£ ? 

= x J J \V{t])\dr]cH;, 

0 0 

on changing the order of integration. (22. 15.7) becomes 

(22.15.9) £ 2 |r(£)|<2 J J \V(rj)\dr,dC + 0(£). 

o o 

Since ^(x) = 0(x), it follows from (22.15.8) that F(£) is bounded as 
£ — ► oo. Hence we may write 

H 

a= ito|F«)|, [ \V(r,)\dq, 

£“►«> £ J 

0 

since both these upper limits exist. Clearly 

(22.15.10) |r(£)Ka + o(l) 



486 


THE SERIES OF PRIMES 


[Chap, xxn 


and 

H 

f /?£+«(£). 

0 

Using this in (22.15.9), we have 

H 

£ 2 |F(£ )| ^lfm + «(£)}</£ + 0(£) = PH 2 + o(H 2 ) 

o 

and so 

\vm ^p+o(\). 

Hence 

(22.15.11) a < p. 

22.16. Completion of the proof of Theorems 434, 6, and 8. By 

(22.15.8), Theorem 434 is equivalent to the statement that F(£) — ► 0 
as £ — ► oo, that is, that a = 0. We now suppose that a > 0 and prove that, 
in that case, (3 < a in contradiction to (22.15.1 1). We require two further 
lemmas. 

Theorem 435. There is a fixed positive number A\, such that, for every 
positive £i, £ 2 . vve have 

Hi 

J V (*l)dTj <A\. 

Hi 

If we put x — e$, t = we have 

/^ = /{^-I)* = 0( i) 

0 1 
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$2 

-/ 


V(jri)dr\ = / V(rj)dri - / V(n)dtj = 0(1 ) 


and this is Theorem 435. 


Theorem 436. Ifrjo > 0 and V (rjo) = 0, then 


a 

J \V(m 4 T)| dr ^ 5 a 2 4 0(t]q l ). 


We may write (22.14.2) in the form 


ijr(x) log x 4 A(m)A(n) = 2xlogx 4- 0(x). 


mn<x 


If x > xo > 1, the same result is true withxo substituted forx. Subtracting, 
we have 

\J/(x) logx - ^(xo) logxo 4 ^2 A(m)A(n) 

XQ<mn^x 

= 2 (x logx - xq logxo) 4 0(x). 

Since A («) ^ 0, 

0 ^ if(x) logx - rfr(xo) logxo ^ 2 (x logx - xq logxo) 4 0 (x), 


whence 


\R(x) logx - R(x 0) logxo | < X logx — xo logxo 4 0(x). 


We put x = e' ?0+T , xo = e m , so that R(xo) = 0. We have, since 
0 ^ r < a, 

l^o + tK l-(-^-W r 4 o(-) 

V^7o + t/ \mj 

= 1 - e~ z 4- O(l/r)o) < r 4 0(l/r?o) 
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and so 


a 

j \V(j)o + T)\dx 
0 




We now write 


„ 3a 2 + 4 A i 

3 2S— >a ' 

take f to be any positive number and consider the behaviour of V(tj) in 
the interval S^ri^S+8 — a. By (22.15.8), V(rj) decreases steadily as 
rj increases, except at its discontinuities, where V ( 77 ) increases. Hence, in 
our interval, either V(tjo) = 0 for some 770 or V(rj) changes sign at most 
once. In the first case, we use (22. 15. 10) and Theorem 436 and have 


?+5 no * 70+8 £+5 

J\v(n)\*v = f+[ + f wmdn 

( (no w+a 

^ a(*7o — C) + j a 2 + a(f + 8 - 770 - a) + o(l) 
= a (5 - \a) + o(l) = a' 8 + o( 1) 


for large f , where 


In the second case, if V(rj) changes sign just once at 77 = 771 in the 
interval £ ^tj^.^+ 8 — a, we have 


C+S-a 

r 

n\ 

p 


(+8— a 

/ \y(n)\drj- 

/ VWn 

+ 

/ V(r,)drj 
J 

V 

c 

J 

( 


m 


while, if V ( 77 ) does not change sign at all in the interval, we have 


(+8— a 

J \V(r])\dr) 

( 


(+8- a 

j V (Tj)drj 

( 


<A ! 
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f+3 oc f-l -S 

j wmdn= J + J 


where 


f f+5-or 

< 2A\ + a 2 + o(l) = a"S + o(l), 
'4Ai +2a 2 ' 


„ 2A\ + a 2 /4Ai+2a 2 \ / a \ , 


Hence we have always 


J I V 0?)l dt\ ^ a'S + o(l). 


where o(l) -»• 0 as £ — ► oo. If M = [£/S], 


« M—i <”+»* * 

J \V(r,)\d n = J \rw\dti + J \rmdq 


m=0 


MS 


< a' MS + o(M) + 0(1) = a’H + o(£). 


Hence 


S 

P = limj J \V(T])\dT] ^ a' < a, 
o 

in contradiction to (22.15.11). It follows that a — 0, whence we have 
Theorem 434 and Theorem 6. As we saw on p. 10, Theorem 8 is a trivial 
deduction from Theorem 6. 

22.17. Proof of Theorem 335. Theorem 335 is a simple consequence 
of Theorem 434. We have 

52 M(") log(^) = 0(x ) 

«^X 
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by Theorem 423 and so 

Mix ) log* = X) pin) log n + 0(x). 

n^x 

By Theorem 297, with the notation of § 22.15, 

— X M(«) l°g n = EE mQ) a(<o = Y. 

n^x d\n dk^x 

= Emwv'© = 

k^jc k^x 

= E /*<*) [ j] + E "W ([^])=S3+S4 

*<jc 

(say). Now, by (22.14.5), 

S 3 = [j] = XX^ ( *> = L 

*<jc x k\n 

By Theorem 434, R(x) = o( x); that is, for any e > 0, there is an integer 
N = Nie) such that |/?0c)| < ex for all x ^ N. Again, by Theorem 414, 
|/?(jc) | < Ax for all x ^ 1. Hence 

E 4g] 

k^x k<^x/N x/N<k^x 

< ex logOt/A) + Ax {logx — log(x/jV)} + 0(x) 

= exlogx + 0{x). 

Since e is arbitrary, it follows that S 4 = o(x log x) and so 

—Mix) log* = S 3 + 54 4- Oi x) = oix log*), 
whence Theorem 335. 

22.18. Products of k prime factors. Let k ^ 1 and consider a positive 
integer n which is the product of just k prime factors, i.e. 


( 22 . 18 . 1 ) 


n=P\P2---Pk- 
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22.18 (437)] 


In the notation of § 22.10, £2(n) = k. We write t*(x) for the number of 
such n ^ x. If we impose the additional restriction that all the/? in (22. 18.1) 
shall be different, n is squarefree and (o(n) = £2 (n) = k. We write n^ix) 
for the number of these (squarefree) n ^ x. We shall prove 


Theorem 437: 


*k(x) 


~ Tjfc(x) ~ 


x(loglogx)* 1 
( k — 1)! log* 


(k > 2). 


For k = 1, this result would reduce to Theorem 6, if, as usual, we take 
0! = 1. 

To prove Theorem 437, we introduce three auxiliary functions, viz. 


Lkix ) = 


1 

P\P2 • • -P'k 


,n*(x) = £l, tf*(x) = £ 


log(/?i/? 2 . • • Pk ), 


where the summation in each case extends over all sets of primes p\,pi , . . . , 
Pk such that p\ . . .pk ^ x, two sets being considered different even if they 
differ only in the order of the p. If we write c„ for the number of ways in 
which n can be represented in the form (22. 1 8. 1), we have 


n * (*) = 52 c "’ = 52 Cn log "• 

n^x n^jc 

If all the p in (22. 1 8. 1) are different, c n = k\, while in any case c n < k\. If 
n is not of the form (22. 18.1), c n = 0. Hence 


(22.18.2) k\nk{x) ^ n*(x) < k\ t*(x) (k 1). 

Again, for k ^ 2, consider those n which are of the form (22. 18. 1) with at 
least two of the p equal. The number of these n < x is Xk (x) — tt* (x). Every 
such n can be expressed in the form (22.18.1) with pk-\ — Pk and so 

(22.18.3) 

t*(x) - 7T k (x) ^ 52 1 < 52 1 = n *-iW (* > 2). 

P\P2-P\_ X ^ X P\P2-Pk-\^X 

We shall prove below that 

#*(x) ~ Ax(loglogx)* -1 (k ^ 2). 


(22.18.4) 
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By (22.5.2) with / ( t ) = log t, we have 


X 

&k(x) = n k (x) logx - j dt. 

2 


Now r*(x) < x and so, by (22.18.2), n*(r) = 0(t) and 


/ 


n k {t) 

t 


dt = 0(x). 


Hence, for k > 2, 


(22.18.5) 


IUW ='^ + Q ~ fa(1 °f 0gX)t "' 

logx \ log X / logx 


by (22.18.4). But this is also true fork = 1 by Theorem 6, since I7i(x) = 
7t(x). When we use (22.18.5) in (22.18.2) and (22.18.3), Theorem 437 
follows at once. 

We have now to prove (22. 1 8.4). For all k ^ 1 , 


k&k+i (x) = { lo g(/>2P3 • • . Pk+l ) + log(pi J p 3 p 4 • • -Plc+ 1 ) 

P l-Pk+l^X 

H h log(/7i^2 . . Pk)) 

= (* + 1) Y2 1°&(P2P3 • • -Pk+l) = (k + 1) Y2 
p\-Pk+\<x p^ 

and, if we put Z,o(x) — 1 , 


L k (x) 


= E — *— = E -**-> (-Y 

P,^ p ' - pk £i p ' W 


Hence, if we write 


fk(x) = £*(*) - kxL k - i(x), 
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(22.18.6) kf k+ i(x) = (k+ 1)£/* (-) • 

p^x ^ P// 

We use this to prove by induction that 

(22.18.7) f k (x) = o {xOoglogx)*" 1 } (k > 1). 

First 

fl (x) = # i (*) - x = &(x) - x = o(x) 

by Theorems 6 and 420, so that (22. 1 8.7) is true for & = 1 . Let us suppose 

(22.18.7) true for k = K ^ 1 so that, for any e > 0, there is an xq = 
xo(K, e) such that 

\f K {x)\ < ex(loglogx)*- 1 

for all x ^ xq. From the definition of fjc(x), we see that 

\Mx)\ < D 

for 1 ^ x < xo, where D depends only on K and e. Hence 
£ 1 * 6)1 < fdogiog^-' y . - 

P<*/x o VP/ P^x/XQ F 

< 2e*(loglog;c)* 

for large enough x, by Theorem 427. Again 

^ l-^ ( ^) I < D7r(x) < Dx ' 

x/xo<p^x 1 \P'\ 

Hence, by (22.18.6), since K + 1 ^ 2K, 

L/k+i(*)I < 2x (2e(loglogx) /: +Dj < 5ex(loglogx)* 

for* > x\ = x\ (e,D,K) = x\(€,K). Since € is arbitrary, this implies 

(22. 1 8.7) for k = K + 1 and it follows for all k > 1 by induction. 
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After (22.18.7), we can complete the proof of (22.18.4) by showing that 

(22.18.8) L k (x) ~ (loglogx)* (k ^ 1). 

In (22.18.1), if every p t ^ x l/k , then n < x; conversely, ifn^x, then 
Pi ^ x for every /. Hence 



But, by Theorem 427, 

Y l -~ loglogjc, Y, “ ~ l°g ~ loglog 

P &v p v k > 

and (22.18.8) follows at once. 

22.19. Primes in an interval. Suppose that e > 0, so that 


(22.19.1) 


7t(x + €X ) — 7T(JC) = 


x + ex 

logx + log(l +e) 




The last expression is positive provided that x > xo(e). Hence there is 
always a prime p satisfying 


(22.19.2) 


X < p < (1 + €)X 


when x > jco( 0- This result may be compared with Theorem 418. The 
latter corresponds to the case € = 1 of (22.19.2), but holds for all x ^ 1. 
If we put € = 1 in (22. 19.1), we have 

(22.19.3) jt(2x) - n(x) = — +o ~ rr(x). 

log* \logxJ 

Thus, to a first approximation, the number of primes between x and 2x is 
the same as the number less than x. At first sight this is surprising, since we 
know that the primes near x ‘thin out’ (in some vague sense) as x increases. 
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In fact, 7r(2x) — 2n(x) —y oo as x — ► oo (though we cannot prove this 
here), but this is not inconsistent with (22.19.3), which is equivalent to 

7t(2x) — 2j r(x) = 0{n(x)}. 

22.20. A conjecture about the distribution of prime pairs p,p + 2. 
Although, as we remarked in § 1.4, it is not known whether there is an 
infinity of prime-pairs p,p+2, there is an argument which makes it plausible 
that 

2 Cox 

(22 - 201) ~ 0 ^' 
where P 2 (x) is the number of these pairs with p ^ x and 

We take x any large positive number and write 

N ~ Up 


We shall call any integer n which is prime toN, i.e. any n not divisible by any 
prime/; not exceeding *Jx, a special integer and denote by S(X) the number 
of special integers which are less than or equal to X. By Theorem 62, 


S(N) = <P(N) = N 



= NB(x) 


(say). Hence the proportion of special integers in the interval (1, N) is 
B(x). It is easily seen that the proportion is the same in any complete set 
of residues (mod N) and so in any set of rN consecutive integers for any 
positive integral r. 

If the proportion were the same in the interval (1, x), we should have 


2e~ y x 

logx 


S(x) = xB(x) 
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by Theorem 429. But this is false. For every composite n not exceeding x 
has a prime factor not exceeding y/x and so the special n not exceeding x 
are just the primes between y/x (exclusive) and x (inclusive). We have then 

S(x) = nix) - niy/x) ~ 

by Theorem 6. Hence the proportion of special integers in the interval ( 1 , x) 
is about \e Y times the proportion in the interval (1, N). 

There is nothing surprising in this, for, in the notation of § 22.1, 

lOg-ZV — ${y/x) ~ y/x 

by Theorems 413 and 434, and so N is much greater than*. The proportion 
of special integers in every interval of length N need not be the same as that 
in a particular interval of (much shorter) length x/ Indeed, S(y/x) = 0, 
and so in the particular interval (1, y/x) the proportion is 0. We observe 
that the proportion in the interval (N — x,N) is again about 1/ log*, and 
that in the interval (N — y/x, N) is again 0. 

Next we evaluate the number of pairs n, n + 2 of special integers for 
which n < N. If n and n + 2 are both special, we must have 

n = 1 (mod 2), n = 2 (mod 3) 


and 


n = 1, 2, 3, ... ,p — 3, oxp — 1 (modp) (3 < p ^ y/x ) 

The number of different possible residues for n (mod N) is therefore 

f[ 0-2) = [at n (l-?)=JV5,<*) 

3 <JJ^y/x 3 ^J?4:y/x ^ 

(say) and this is the number of special pairs n, n + 2 with n < N. 

Thus the proportion of special pairs in the interval (1 ,N) is 5 i(jc) and 
the same is clearly true in any interval of rN consecutive integers. In the 
smaller interval (l,x), however, the proportion of special integers is about 
je y times the proportion in the longer intervals. We may therefore expect 
(and it is here only that we ‘expect’ and cannot prove) that the proportion 

t Considerations of this kind explain why the usual ‘probability’ arguments lead to the wrong 
asymptotic value for 7t(jc). 
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2 

of special pairs rt, n + 2 in the interval (1, jc) is about times the 

proportion in the longer intervals. But the special pairs in the interval ( 1 , x) 
are the prime pairs p,p + 2 in the interval (^/x,x). Hence we should expect 
that 

P 2 (x) - P 2 (Jx) ~ \e lv xB\ (x). 


By Theorem 429, 


and so 


B(x) 


2 (T* 

logx 


le 2 rBi(x) 


1 *i(*) 

(log*) 2 [B{x)\ 2 ' 


But 


*i(*) 

{B(x)} 2 


n 


(1 - 2/P) 
(i - Up ) 2 


=2 n 


P(P~ 2) 

ip - l) 2 


2 C 2 


as x — y oo. Since P 2 (Jx) = 0(+/x), we have finally the result (22.20.1). 


NOTES 

§§ 22.1, 2, and 4. The theorems of these sections are essentially Tchebychers. Theo- 
rem 416 was found independently by de Polignac. Theorem 415 is an improvement of a 
result of Tchebychers; the proof we give here is due to ErdSs and Kalmar. 

There is full information about the history of the theory of primes in Dickson’s History 
(i, ch. xviii), in Ingham’s tract (introduction and ch. i), and in Landau’s Handbuch (3-102 
and 883-5); and we do not give detailed references. 

There is also an elaborate account of the early history of the theory in Torelli, Sulla 
totalita dei numeri primi, Atti della R. Acad . di Napoli (2) 11 (1902), 1-222; and shorter 
ones in the introductions to Glaisher’s Factor table for the sixth million (London, 1883) 
and Lehmer’s table referred to in the note on § 1 .4. 

§22.2 Various authors have given versions of Theorem 414 with explicit numerical 
constants. Thus Tchebychef {Mem. Acad. Sc. St. Petersburg 7, ( 1 850-1 854), 1 5-33) showed 
that 

(0.921...)* <0Cx) < (1.105...)* 

for large enough *, and used this in his proof of Bertrand’s postulate. Diamond and Erdds 
{ Enseign . Math. (2) 26 (1980), 313-21) have shown that elementary methods of the kind 
used by Tchebychef allow one to get upper and lower bound constants as close to 1 as 
desired. Unfortunately, since their paper actually uses the Prime Number Theorem in the 
course of the argument, their result does not produce an independent proof of the theorem. 
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§ 22.3. ‘Bertrand’s postulate’ is that, for every n > 3, there is a prime p satisfying 
n < p < In — 2. Bertrand verified this for n < 3, 000, 000 and Tchebychef proved it for all 
n > 3 in 1 850. Our Theorem 418 states a little less but the proof could be modified to prove 
the better result. Our proof is due to Erdds, Acta Lift. Ac. Sci. (Szeged), 5 (1932), 194=8. 

For Theorem 419, see L. Moser, Math. Mag. 23 (1950), 163-4. See also Mills, Bull . 
American Math. Soc. 53 (1947), 604; Bang, Norsk. Mat. Tidsskr. 34 (1952), 117-18; and 
Wright, American Math. Monthly, 58(1 951), 616-18 and 59 ( 1 952), 99 andJoumal London 
Math. Soc. 29 ( 1 954), 63-7 1 . 

§ 22.7. Euler proved in 1737 that ^ and FIO ~P~ l ) are divergent. 

§ 22.8. For Theorem 429 see Mertens, Journal fur Math. 78 ( 1 874), 46-62. For another 
proof (given in the first two editions of this book) see Hardy, Journal London Math. Soc. 
10 (1935), 91-94. 

§ 22. 10. Theorem 430 is stated, in a rather more precise form, by Hardy and Ramanujan, 
Quarterly Journal of Math. 48 (1917), 76-92 (no. 35 of Ramanujan’s Collected papers). It 
may be older, but we cannot give any reference. 

§§ 22.1 1-13. These theorems were first proved by Hardy and Ramanujan in the paper 
referred to in the preceding note. The proof given here is due to Turin, Journal London 
Math. Soc. 9 (1934), 274-6, except for a simplification suggested to us by Mr. Marshall 
Hall. Turin [ibid. 11 (1936), 125—33] has generalized the theorems in two directions. 

In fact the function (a) (n) — loglog/z) /y/loglog n is normally distributed, in the sense 
that, for any fixed real z, one has 

( > e 'i°g'° e " ° z ) -* vb/l“ p |-^ /2 K 

as x -> oo. The same is true if c o(n ) is replaced by £2(/i). These results are due to Erdds 
and Kac (Amer. J. Math. 62, (1940) 738-42). 

There is a massive literature on the distribution of values of additive functions. See, 
for example, Kubilius, Probabilistic methods in the theory of numbers (Providence, R.I., 
A.M.S., 1 964) and Kac, Statistical independence in probability, analysis and number theory 
(Washington, D.C., Math. Assoc. America, 1959). 

§§ 22.14-16. A. Selberg gives his theorem in the forms 

(x) log* + ^2 & log* -I- 0(x) 

p^x 


x” 1 # 


«^x: 


(JL> 


and 


£log 2 P+ ^2 l°gplogp / = 2x logx 4* 0(x). 

p^x pp'^x 

These may be deduced without difficulty from Theorem 433. There are two essentially 
different methods by which the Prime Number Theorem may be deduced from Selberg’s 
theorem. For the first, due to Erdds and Selberg jointly, see Proc. Nat. Acad. Sci. 35 (1949), 
374-84 and for the second, due to Selbeig alone, see Annals of Math. 50 (1949), 305-13. 
Both methods are more ‘elementary’ (in the logical sense) than the one we give, since they 
avoid the use of the integral calculus at the cost of a little complication of detail. The method 
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which we use in §§ 22.15 and 16 is based essentially on Selberg’s own method. For the use 
of \!r(x) instead of (jc), the introduction of the integral calculus and other minor changes, 
see Wright, Proc. Roy. Soc . Edinburgh , 63 (1951), 251-61. 

For an alternative exposition of the elementary proof of Theorem 6, see van der Corput, 
Collogues sur la theorie des nombres (Lidge 1956). See Errera [ibid. 111-18) for a short 
(non-elementary) proof. The same volume (pp. 9-66) contains a reprint of the original paper 
in which de la Vallee Poussin (contemporaneously with Hadamard, but independently) gave 
the first proof ( 1 896). 

Later work by de la Vallee Poussin showed that 

*<*> = f 2 X ^f t + ° (* ex P \-cJw 

= x + O (xexp j-Cv/iogcJ) 


)) 


for a certain positive constant c. These have been improved by subsequent authors, the best 
known error term now being O ^cexp |— c(logjc) 3 / 5 (loglogx)~ 1 / 5 , due independently 

to Korobov [Uspehi Mat. Nauk 13 (1958). no. 4 (82), 185-92) and Vinogradov [Izv. Akad. 
NaukSSSR. Ser. Mat. 22 (1958), 161-64). 

For an alternative to the work of § 22. 15, see V. Nevanlinna, Soc. Sci. Fennica: Comm. 
Phys. Math. 27/3 (1962), 1-7. The same author [Ann. Acad. Sci. Fennicae A 1343 (1964), 
1-52) gives a comparative account of the various elementary proofs. 

Two other, quite different, elementary proofs of the prime number theorem have also 
been given. These are by Daboussi (C. R. Acad. Sci. Paris Sir. I Math. 298 (1984), 161-64) 
and Hildebrand [Mathematika 33 (1986), 23-30) respectively. 

Various authors have shown that the elementary proof based on Selberg’s formulae can 
be adapted to prove an explicit error term in the Prime Number Theorem. In particular 
Diamond and Steinig [Invent. Math. 1 1 (1970), 199-258) showed in this way that 


and 

\t/[x) = x + 0[x exp(— log 0 jc)) 

for any fixed 6 < Sec also Lavrik and Sobirov [Dokl. Akad. Nauk SSSR, 211 (1973), 
534—6), Srinivasan and Sampath [J. Indian Math. Soc. [N.S.), 53 (1988), 1-50), and Lu 
[Rocky Mountain J. Math. 29 (1999), 979-1053). 

§ 22.18. Landau proved Theorem 437 in 1900 and found more detailed asymptotic 
expansions for 7t*(jc) and t*(x) in 1911. Subsequently Shah (1933) and S. Selbeig (1940) 
obtained results of the latter type by more elementary means. For our proof and references 
to the literature, see Wright, Proc. Edinburgh Math. Soc. 9 (1954), 87-90. 

§ 22.20. This type of argument can be applied to obtain similar conjectural asymptotic 
formulae for the number of prime-triplets and of longer blocks of primes. See Cherwell and 
Wright, Quart. J. Math. 11 (1960), 60-63 amd P61ya American Math. Monthly 66 (1959), 
375-84. Hardy and Littlewood [Acta Math. 44 (1923), 1-70 (43)] found these formulae by 
a different (analytic) method (also subject to an unproved hypothesis). They give references 
to work by Staeckel and others. See also Cherwell, Quarterly Journal of Math. (Oxford), 
17 (1946), 46-62, for another simple heuristic method. 
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The formulae agree very well with the results of counts. D. H. and E. Lehmer have carried 
these out for various prime pairs, triplets, and quadruplets up to 40 million and Golubew has 
counted quintuplets,. . ., 9-plets up to 20 million ( Osterreich Akad. Wiss . Math.-Naturwiss. 
Kl. 1971, no. 1, 19-22). See also Leech (Math. Comp. 13 (1959), 56) and Bohman (BIT, 
Nordisk Tidskr. Inform, behandl. 13 (1973), 242-4). 



XXIII 

KRONECKER’S THEOREM 

23.1. Kronecker’s theorem in one dimension. Dirichlet’s Theorem 
201 asserts that, given any set of real numbers #i,#2 ,•••,#*, we can 
make n&i,n&2» • • • , all differ from integers by as little as we please. 
This chapter is occupied by the study of a famous theorem of Kronecker 
which has the same general character as this theorem of Dirichlet but lies 
considerably deeper. The theorem is stated, in its general form, in § 23.4, 
and proved, by three different methods, in §§ 23.7-9. For the moment 
we consider only the simplest case, in which we are concerned with a 
single . 

Suppose that we are given two numbers # and a. Can we find an integer 
n for which 


n& - a 


is nearly an integer ? The problem reduces to the simplest case of Dirichlet’s 
problem when a = 0. 

It is obvious at once that the answer is no longer unrestrictedly affirma- 
tive. If $ is a rational number alb, in its lowest terms, then (n&) = /»#—[/!#] 
has always one of the values 


(23.1.1) 


„ 1 2 
°* b ’ b ’ 


b- 1 
b ' 


If 0 < or < 1, and a is not one of (23.1.1), then 



(r = 0, 1,...,Z>) 


has a positive minimum (m, and n$ — a cannot differ from an integer by 
less than p. 

Plainly /z ^ 1/2 b, and — ► 0 when b -> oo; and this suggests the truth 
of the theorem which follows. 


Theorem 438. If & is irrational, a is arbitrary, and N and e are positive, 
then there are integers n and p such that n > N and 


(23.1.2) 


| n& —p — a | < e. 
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We can state the substance of the theorem more picturesquely by using 
the language of § 9. 10. It asserts that there are n for which (m?) is as near 
as we please to any number in (0, 1), or, in other words, 

Theorem 439. If i? is irrational, then the set of points (m?) is dense in 
the interval (0, 1).^ 

Either of Theorems 438 and 439 may be called ‘Kronecker’s theorem in 
one dimension’. 

23.2. Proofs of the one-dimensional theorem. Theorems 438 and 439 
are easy, but we give several proofs, to illustrate different ideas important 
in this field of arithmetic. Some of our arguments are, and some are not, 
extensible to space of more dimensions. 

(i) By Theorem 201, with k = 1, there are integers n\ and p such that 
|/tit? — p\ < 6. The point (/in?) is therefore within a distance e of either 0 
or 1 . The series of points 

(/ill?), (2/iii?), (3/in?),..., 

continued so long as may be necessary, mark a chain (in one direction or 
the other) across the interval (0, 1) whose mesh* is less than e. There is 
therefore a point (Ann?) or («i?) within a distance e of any a of (0, 1). 

(ii) We can restate (i) so as to avoid an appeal to Theorem 201, and we 
do this explicitly because the proof resulting will be the model of our first 
proof in space of several dimensions. 

We have to prove the set S of points P n or (/it?) with n = 1, 2, 3, ... , 
dense in (0, 1). Since i? is irrational, no point falls at 0, and no two points 
coincide. The set has therefore a limit point, and there are pairs (P n , P n + r ), 
with r > 0, and indeed with arbitrarily large r, as near to one another as 
we please. 

We call the directed stretch P„ P n +r a vector. If we mark off a stretch 
P m Q, equal to P n P n + r and in the same direction, from any P m , then Q is 
another point of S, and in fact P m +r- It is to be understood, when we make 
this construction, that if the stretch P m Q would extend beyond 0 or 1, then 
the part of it so extending is to be replaced by a congruent part measured 
from the other end 1 or 0 of the interval (0, 1). 

There are vectors of length less than e, and such vectors, with r > N, 
extending from any point of S and in particular from P\ . If we measure off 

t We may seem to have lost something when we state the theorem thus (viz. the inequality n > N). 
But it is plain that, if there are points of the set as near as we please to every a of ( 0 , 1 ), then among 
these points there are points for which n is as large as we please. 

1 The distance between consecutive points of the chain. 
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such a vector repeatedly, starting from P\ we obtain a chain of points with 
the same properties as the chain of (i), and can complete the proof in the 
same way. 

(iii) There is another interesting ‘geometrical’ proof which cannot be 
extended, easily at any rate, to space of many dimensions. 

We represent the real numbers, as in § 3.8, on a circle of unit circumfer- 
ence instead of on a straight line. This representation automatically rejects 
integers; 0 and 1 are represented by the same point of the circle and so, 
generally, are (m?) and n&. 

To say that S is dense on the circle is to say that every a belongs to the 
derived set S'. If a belongs to S but not to S', there is an interval round 
a free from points of S, except for a itself, and therefore there are points 
near a belonging neither to S nor to S'. It is therefore sufficient to prove 
that every a belongs either to S or to S'. 

If a belongs neither to S nor to S', there is an interval (a — S, a + 5'), 
with positive S and S', which contains no point of S inside it; and among 
all such intervals there is a greatest We call this maximum interval 1(a) 
the excluded interval of a. 

It is plain that, if a is surrounded by an excluded interval 1(a), then 
a — d is surrounded by a congruent excluded interval I( a — #). We thus 
define an infinite series of intervals 

1(a), I (a - #), I(a - 2#), ... 

similarly disposed about the points a, a — ft, a — 2$, . . . .No two of these 
intervals can coincide, since # is irrational; and no two can overlap, since 
two overlapping intervals would constitute together a larger interval, free 
from points of S, about one of the points. This is a contradiction, since the 
circumference cannot contain an infinity of non-overlapping intervals of 
equal length. The contradiction shows that there can be no interval 1(a), 
and so proves the theorem. 

(iv) Kronecker’s own proof is rather more sophisticated, but proves a 
good deal more. It proves 

Theorem 440. If & is irrational, a is arbitrary, and N positive, then 
there is an n > N and a p for which 


| ni? — p — a\ < -. 

n 


t We leave the formal proof, which depends upon the construction of ‘Dedekind sections’ of the 
possible values of S and S', and is of a type familiar in elementary analysis, to the reader. 
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It will be observed that this theorem, unlike Theorem 438, gives a definite 
bound for the ‘error’ in terms of n, of the same kind (though not so precise) 
as those given by Theorems 183 and 193 when a = 0. 

By Theorem 193 there are coprime integers q > 2N and r such that 

(23.2.1) \q& — r\ < — . 

<1 

Suppose that Q is the integer, or one of the two integers, such that 

(23.2.2) | qa - Q\ < 

We can express Q in the form 

(23.2.3) Q = vr — uq, 
where u and v are integers and 

(23.2.4) |v| sc \q. 

Then 


q(v& - u -a) = v(q& - r) - (qa - Q), 


and therefore 

(23.2.5) |?(v# -u-a)\ < \q ■ - + ]- = 1, 

q 2 

by (23.2.1), (23.2.2), and (23.2.4). If now we write 


then 

(23.2.6) 

and 


n = q + v, p = r + u, 

N < jq ^ n ^ jq 


| n& — p — a\ ^ | vi? 


, • O ,11 

u — a\.+ \q$—r\ < - + - 

<1 <1 



by (23.2.1), (23.2.5), and (23.2.6). 
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It is possible to refine upon the 3 of the theorem, but not, by this method, 
in a very interesting way. We return to this question in Ch. XXIV. 

23.3. The problem of the reflected ray. Before we pass to the general 
proof of Kronecker’s theorem, we shall apply the special case already 
proved to a simple but entertaining problem of plane geometry solved by 
Konig and Szucs. 

The sides of a square are reflecting mirrors. A ray of light leaves a point 
inside the square and is reflected repeatedly in the mirrors. What is the 
nature of its path?* 

Theorem 441 . Either the path is closed and periodic or it is dense in the 
square, passing arbitrarily near to every point of the square. A necessary 
and sufficient condition for periodicity is that the angle between a side 
of the square and the initial direction of the ray should have a rational 
tangent. 

In Fig. 9 the parallels to the axes are the lines 

x = /+ 5, y = m + j, 

where / and m are integers. The thick square, of side 1 , round the origin is 
the square of the problem and P, or (a, b ) , is the starting-point. We construct 
all images of P in the mirrors, for direct or repeated reflection. A moment’s 
thought will show that they are of four types, the coordinates of the images 
of the different types being 

(A) a -f- 2.1, b 4- 2m; (B) a 4- 21, — b 4- 2m 4- 1 ; 

(C) — a + 2/ + 1, b 4- 2m; (D) — a — I— 2/ H- 1, — b 4- 2m 4~ lj 

where / and m are arbitrary integers.* Further, if the velocity at P has 
direction cosines X, p, then the corresponding images of the velocity have 
direction cosines 

(A) k, p; (B )k,-p; (C) -k,p; (D) -k,-p. 

We may suppose, on grounds of symmetry, that p is positive. 

t It may happen exceptionally that the ray passes through a corner of the square. In this case we 
assume that it returns along its former path. This is the convention suggested by considerations of 
continuity. 

* The x-coordinate takes all values derived from a by the repeated use of the substitutions x' = 1 —x 
and x' — — x. The figure shows the images corresponding to non-negative / and m. 
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If we think of the plane as divided into squares of unit side, the interior 
of a typical square being 

(23.3.1) / — | < x < / + 3, m — j<y<m + ^, 

then each square contains just one image of every point in the original 
square 


-\<x <\, -\<y < i; 

and, if the image in (23.3.1) of any point in the original square is of type 
A, B, C, or D, then the image in (23.3.1) of any other point in the original 
square is of the same type. 

We now imagine P moving with the ray. When P meets a mirror at Q, it 
coincides with an image; and the image of P which momentarily coincides 
with P continues the motion of P, in its original direction, in one of the 
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squares adjacent to the fundamental square. We follow the motion of the 
image, in this square, until it in its turn meets a side of the square. It is 
plain that the original path of P will be continued indefinitely in the same 
line L, by a series of different images. 

The segment of L in any square (23 .3 . 1 ) is the image of a straight portion 
of the path of P in the original square. There is a one-to-one correspondence 
between the segments of L, in different squares (23.3.1), and the portions 
of the path of P between successive reflections, each segment of L being 
an image of the corresponding portion of the path of P. 

The path of P in the original square will be periodic if P returns to its 
original position moving in the same direction; and this will happen if 
and only if L passes through an image of type A of the original P. The 
coordinates of an arbitrary point of L are 

x = a + kt, y = b + fit. 

Hence the path will be periodic if and only if 

kt = 21, fit = 2m 

for some t and integral /, m; i.e. if k/fi is rational. 

It remains to show that, when k/fi is irrational, the path of P approaches 
arbitrarily near to every point (£, rj) of the square. It is necessary and 
sufficient for this thatZ, should pass arbitrarily near to some image of (£, rj) 
and sufficient that it should pass near some image of (£, rj) of type A, and 
this will be so if 

(23.3.2) \a -f- kt — £ — 2/| < €, \b -T fit — rf — 2m\ <. € 

for every £ and rf, any positive e, some positive t, and appropriate integral 
/ and m. 

We take 


rj + 2m — b 

when the second of (23.3.2) is satisfied automatically. The first inequality 
then becomes 


\m& — co-~ 1 1 < 


(23.3.3) 
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where 


ft = -, 0>- (b- 

Theorem 438 shows that, when ft is irrational, there are / and m, large 
enough to make t positive, which satisfy (23.3.3). 

23.4. Statement of the general theorem. We pass to the general prob- 
lem in space of k dimensions. The numbers #i , #2» • ••>#* are given, and 
we wish to approximate to an arbitrary set of numbers ai,<*2, • • • ,«*> inte- 
gers apart, by equal multiples of t?i, #2» • • • > &k- It is plain, after § 23.1, 
that the ft must be irrational, but this condition is not a sufficient condition 
for the possibility of the approximation. 

Suppose for example, to fix our ideas, that k = 2, that ft, 0, a, (i are 
positive and less than 1, and that ft and 0 (whether rational or irrational) 
satisfy a relation 


aft + b<j> + c = 0 


with integral a , b, c. Then 


a.nft + b.n<f) 


and 


a(nft) -|- 6(n0) 


are integers, and the point whose coor- 
dinates are (nft) and («0) lies on one or 
other of a finite number of straight lines. 
Thus Fig. 10 shows the case a = 2, b = 3, 
when the point lies on one or other of 
the lines 2x + 3y = v (v = 1, 2, 3, 4). It 
is plain that, if (a, f}) does not lie on 
one of these lines, it is impossible to 
approximate to it with more than a certain 
accuracy. 

We shall say that a set of numbers 





Fig. 10 




23.4(442)] KRONECKER’S THEOREM 509 

is linearly independent if no linear relation 

Q \%\ + 02^2 4* * • • + = 0, 

with integral coefficients, not all zero, holds between them. Thus, if 
P\ ,P2, • ■ • iPr are different primes, then 

log/71, log p2,...,\ogp r 

are linearly independent; for 

a\ log/71 + 02 log/72 H 1 -a r log/7 r = 0 


/ 7>* 2 .../7?' = l, 

which contradicts the fundamental theorem of arithmetic. 
We now state Kronecker’s theorem in its general form. 

Theorem 442. If 


1 

are linearly independent, a\, 012 , ...» a* are arbitrary, and N and € are 
positive, then there are integers 

n > N, p\ ,/? 2 , . . • tPk 


such that 


\n&m -Pm - Otm\ < * (m = 1,2,...,*). 

We can also state the theorem in a form corresponding to Theorem 439, 
but for this we must extend the definitions of § 9. 1 0 to ^-dimensional space. 

If the coordinates of a point P of * -dimensional space are x\,X 2 , . ■ ■ , j c*, 
and 8 is positive, then the set of points x\,x' 2 , . . . ,x' k for which 

\x' m -x m \ ^8 (m= 1,2,...,*) 

is called a neighbourhood of P. The phrases limit point, derivative, closed, 
dense in itself, and perfect are then defined exactly as in § 9.10. Finally, if 
we describe the set defined by 


0 <,x m < 1 (m = 1,2,...,*) 
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as the ‘unit cube’, then a set of points S is dense in the unit cube if every 
point of the cube is a point of the derived set S'. 

Theorem 443. If & 1, #2. •••»#*» 1 are linearly independent, then the set 
of points 


(n&i),(n#2),--,(n&k) 


is dense in the unit cube. 

23.5. The two forms of the theorem. There is an alternative form of 
Kronecker’s theorem in which both hypothesis and conclusion assert a 
little less. 

Theorem 444. If $2, ,$k are linearly independent, a\,U 2 

are arbitrary, and T and e are positive, then there is a real number t, and 
integers pi,P 2 , • • • >Pk> such that 


t > T 


and 


\t$m ~ Pm ~ <*m\ < e (m = 1,2, ...,£). 

The fundamental hypothesis in Theorem 444 is weaker than in Theorem 
442, since it only concerns linear relations homogeneous in the ft. Thus 
&\ = y/2, t ?2 = 1 satisfy the condition of Theorem 444 but not that of 
Theorem 442; and, in Theorem 444, just one of the may be rational. The 
conclusion is also weaker, because t is not necessarily integral. 

It is easy to prove that the two theorems are equivalent. It is useful to 
have both forms, since some proofs lead most naturally to one form and 
some to the other. 

(1) Theorem 444 implies Theorem 442. We suppose, as we may, that 
every $ lies in (0, 1) and that € < 1. We apply Theorem 444, with k 4- 1 
for k, N + 1 for T, and for e, to the systems 

1 ; a\,0C2, . . . ,<*£»(). 

The hypothesis of linear independence is then that of Theorem 442; and 
the conclusion is expressed by 


(23.5.1) 


t>N+l, 
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\t$m ~Pm “Of ml < (m = 1,2,..., k), 

\t~Pk+\\ < 

From (23.5. 1) and (23.5.3) it follows that pk+i > N, and from (23.5.2) and 
(23.5.3) that 


\Pk+l#m -Pm -a m \ ^ I t$m - Pm - ct m | + \t - Pk+\ \ < e. 


These are the conclusions of Theorem 442, with n = pk+i- 

(2) Theorem 442 implies Theorem 444. We now deduce Theorem 444 
from Theorem 442. We observe first that Kronecker’s theorem (in either 
form) is ‘additive in the or’; if the result is true for a set of # and for 
<*i, . . . ,ctk, and also for the same set of and for f }\ then it is 
true for the same & and for or i + pi,..., a k + Pk- For if the differences of 
p$ from of, and of q$ from P, are nearly integers, then the difference of 
(p + q)$ from a + p is nearly an integer. 

If i?i , i?2, • • • , #*+i are linearly independent, then so are 


We apply Theorem 442, with N = T, to the system 


*1 




There are integers n > N,p\ ,...,pk such that 


a oc k - 


(23.5.4) 


tr&, 


m 


&k + 1 


Pm &m 


< e (m = 1,2, . . . ,k). 


If we take t — n/$k+ 1 , then the inequalities (23.5.4) are k of those required, 
and 


l*0*+l - ft | = 0 < €. 

Also t ^ n > N — T . We thus obtain Theorem 444, for 

#1,- • ai, • • • ,Qf*,0. 

We can prove it similarly for 

» • • • » $kf • • • » 0 , * 

and the full theorem then follows from the remark at the beginning of (2). 
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23.6. An illustration. Kronecker’s theorem is one of those mathematical theorems 
which assert, roughly, that ‘what is not impossible will happen some times however 
improbable it may be’. We can illustrate this ‘astronomically’. 

Suppose that k spherical planets revolve round a point O in concentric coplanar circles, 
their angular velocities being 2na)\, 27ra>2>- 27rct>*, that there is an observer at 0 9 and 
that the apparent diameter of the inmost planet P , observed from 0, is greater than that of 
any outer planet. 

If the planets are all in conjunction at time t = 0 (so that P occults all the other planets), 
then their angular coordinates at time t are 27T tea i , . . . . Theorem 20 1 shows that we can choose 
a ty as large as we please, for which all these angles are as near as we please to integral 
multiples of 27r. Hence occultation of the whole system by P will recur continually. This 
conclusion holds for all angular velocities. 

If the angular coordinates are initially a \ , 012 , • • •> then such an occultation may never 
occur. For example, two of the planets might be originally in opposition and have equal 
angular velocities. Suppose, however, that the angular velocities are linearly independent. 
Then Theorem 444 shows that, for appropriate /, as large as we please, all of 


2nta)\ +cq,... ,27 Ttcofr + a * 

will be as near as we please to multiples of 27 t; and then occultations will recur whatever 
the initial positions. 


23.7. Lettenmeyer’s proof of the theorem. We now suppose that 
k = 2, and prove Kronecker’s theorem in this case by a ‘geometrical’ 
method due to Lettenmeyer. When k= 1, Lettenmeyer’s argument reduces 
to that used in § 23.2 (ii). 

We take the first form of the theorem, and write ■&, (f> for #i, # 2 - We may 
suppose 


0 < ft < 1 , 0 < 0 < 1 ; 

and we have to show that if i 1 are linearly independent then the points 
P„ whose coordinates are 

(m?), (n0) (n= 1,2,...) 

are dense in the unit square. No two P n coincide, and no P„ lies on a side 
of the square. 

We call the directed stretch 


PnP n+r (/I > 0, r > 0) 
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a vector. If we take any point P m , and draw a vector P m Q equal and parallel 
to the vector P„P n +r, then the other end Q of this vector is a point of the set 
(and in fact P m + r ). Here naturally we adopt the convention corresponding 
to that of § 23.2 (ii), viz. that, if P m Q meets a side of the square, then 
it is continued in the same direction from the corresponding point on the 
opposite side of the square. 

Since no two points P n coincide, the set (P n ) has a limit point; there 
are therefore vectors whose length is less than any positive e, and vectors 
of this kind for which r is as large as we please. We call these vectors e- 
vectors. There are € -vectors, and e -vectors with arbitrarily large r, issuing 
from every P n , and in particular from P\ . If 


e < min(tf,0, 1 — tf, 1 — <(>), 

then all e -vectors issuing from P\ are unbroken, i.e. do not meet a side of 
the square. 

Two cases are possible a priori. 

( 1 ) There are two e-vectors which are not parallel, t In this case we mark 
them off from P\ and construct the lattice based upon P\ and the two other 
ends of the vectors. Every point of the square is then within a distance e of 
some lattice point, and the theorem follows. 

(2) All e -vectors are parallel. In this case all e -vectors issuing from P\ 
lie along the same straight line, and there are points P r , P s on this line with 
arbitrarily large suffixes r, s. Since P \ , P r , P s are collinear, 



tf 

0 

1 


tf 

0 

1 

0 = 

(rtf) 

(r0) 

1 

= 

rtf - [rtf] 

r0 - [r0] 

1 


(5tf) 

(50) 

1 


5tf — [5tf ] 

50 - [50] 

1 


and so 


tf <t> 1 

[rtf] [r<t>] r — 1 

[stf ] [50] 5 - 1 


0 , 


t In the sense of elementary geometry, where we do not distinguish two directions on one straight 
line. 
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or 


aft + b<f> + c — 0, 


where a, b , c are integers. But ft, 0, 1 are linearly independent, and therefore 
a, b, c are all zero. Hence, in particular. 


[r(p] r - 1 
[50] s - 1 


or 

[50] _ [r0] 

5- 1 ” r- 1' 

We can make s — ► oo, since there are P s with arbitrarily laige 5 ; and we 
then obtain 


0 = lim 


[ 50 ] 

5—1 


[r0] 

r- r 


which is impossible because 0 is irrational. 

It follows that case (2) is impossible, so that the theorem is proved. 

23.8. Estermann’s proof of the theorem. Lettenmeyer’s argument 
may be extended to space of k dimensions, and leads to a general proof of 
Kronecker’s theorem; but the ideas which underlie it are illustrated ade- 
quately in the two-dimensional case. In this and the next section we prove 
the general theorem by two other quite different methods. 

Estermann’s proof is inductive. His argument shows that the theorem is 
true in space of k dimensions if it is true in space of k— 1 . It also shows 
incidentally that the theorem is true in one-dimensional space, so that the 
proof is self-contained; but this we have proved already, and the reader 
may, if he pleases, take it for granted. 

The theorem in its first form states that, if ft\, # 2 , • . . , ftk, 1 are linearly 
independent, £* 1 , 0 * 2 , • ■ • , or* are arbitrary, and e and 10 are positive, then 
there are integers n,p\ ,p 2 , ... ,pk such that 


(23.8.1) 


n > co 


and 


(23.8.2) 


\nft m -pm~oc m \ < e (m= 1,2, ...,£). 
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Here the emphasis is on large positive values of n. It is convenient now 
to modify the enunciation a little, and consider both positive and negative 
values of n. We therefore assert a little more, viz. that, given a positive e 
and a ) , and a A of either sign, then we can choose n and the p to satisfy 

(23.8.2) and 


(23.8.3) \n\ > (o , signn = sign A., 

the second equation meaning that n has the same sign as k. We have to 
show (a) that this is true for k if it is true for k — 1 , and (b) that it is true 
when k = 1 . 

There are, by Theorem 201, integers 

s > 0, b\,b 2 ,...,bk 


such that 


(23.8.4) \s$ m -b m \ < (m = 1,2,...,*). 

Since is irrational, s&k — bk 0; and the k numbers 




b m 

sftk bk 


(of which the last is 1) are linearly independent, since a linear relation 
between them would involve one between i 1 . 

Suppose first that k > 1, and assume the truth of the theorem for k— 1 . 
We apply the theorem, with k-l for k, to the system 

...,0*-i (for #i,#2,...,tf*-i), 

Pi =a\ -a k <t>\. Pi = a 2 - a*02, ..., Pk-\ = ct k -\ ~ a*0*-i 

(for ofi,of 2 , • • . ,a*_i), 
\e (fore), k(s# k -b k ) (for k). 


(23.8.5) fll = (a) + l)|.s#* — bk\ + |a*| (for ru). 
There are integers c*, c\, c 2 , . . . , c*_i such that 

(23.8.6) |c*| > Q, 


signc* = sign {X(5^ - b k )}, 
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and 

(23.8.7) \Ck<pm C m fim I < - 5^ = 1> 2, . . . , ^ 1). 

The inequality (23.8.7), when expressed in terms of the is 


(23.8.8) 


Ck + Oik 

« r - (s$m b m ) c m a m 

s&k - bk 



(m = 1,2, . . .,k). 


Here we have included the value kofm, as we may do because the left-hand 
side of (23.8.8) vanishes when m — k. 

We have supposed k > 1. When k = 1, (23.8.8) is trivial, and we have 
only to choose c* to satisfy (23.8.6), as plainly we may. 

We now choose an integer N so that 


(23.8.9) 
and take 



Ck +Ok 

s&k - bk 


< 1, 


n — Ns, p m — Nb m -+- c m . 


Then 


\ n &m Pm a m\ — \N (s& m — b m ) — C m — 0t m \ 

Ck+a, t' 




(sifr m bm) c m 01 , 




I sftfc bk 

<\e + j€=€ (m = 1 , 2 ,...,*), 

by (23.8.4), (23.8.8), and (23.8.9). This is (23.8.2). Next 

Ck + Oik 


4“ b m I 


(23.8.10) 


s&k - b k 


^ \ck\ ~ lof/tl , , 

^ » > CO + 1 , 

\s»k-b k \ 


by (23.8.5) and (23.8.6); so that |W| > a; and 


|n| = |W|5 ^ |N| > co. 

Finally, n has the sign of N, and so, after (23.8.9) and (23.8. 1 0), the sign of 

- Ck 

s&k — bk 
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This, by (23.8.6), is the sign of k. 

Hence n and the p satisfy all our demands, and the induction from k — 1 
to k is established. 

23.9. Bohr’s proof of the theorem. There are also a number of ‘ana- 
lytical’ proofs of Kronecker’s theorem, of which perhaps the simplest is 
one due to Bohr. All such proofs depend on the facts that 

e(x) = e 2 ™ 

has the period 1 and is equal to 1 if and only if x is an integer. 

We observe first that 


lim i f 

r— oo T J 


MT 


e dt = lim 


T-* oo ciT 


= 0 


if c is real and not zero, and is 1 if c = 0. It follows that, if 


(23.9.1) x(D = J2 h ^ C ""’ 

V=1 

where no two c v are equal, then 


T 

(23.9.2) by = ^ f f X«)e- C ’"dl. 

0 


We take the second form of Kronecker’s theorem (Theorem 444), and 
consider the function 


(23.9.3) 0(0 = |F(0I, 

where 

k 

(23.9.4) F«)=l+]T%0V-a„), 

m=l 


of the real variable t. Obviously 


0(O<*+1. 
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If Kronecker’s theorem is true, we can find a large t for which every term 
in the sum is nearly 1 and <j>(t) is nearly k + 1 . Conversely, if 0(0 is nearly 
k + 1 for some large t, then (since no term can exceed 1 in absolute value) 
every term must be nearly 1 and Kronecker’s theorem must be true. We 
shall therefore have proved Kronecker’s theorem if we can prove that 

(23.9.5) lim <f>(t) = k + 1. 

t-*oo 

The proof is based on certain formal relations between F(t ) and the 
function 


(23.9.6) 0(*i,*2, • • • »**) = 1 + x\ + X 2 H 1-** 

of the k variables x. If we raise 0 to the /rth power by the multinomial 
theorem, we obtain 

(23.9.7) V = J2 a "i-*i. ^ x V x 2 2 ■ ■ ■*?■ 

Here the coefficients a are positive; their individual values are irrelevant, 
but their sum is 

(23.9.8) = x/r p (l, 1, . . . , 1) = (* + iy\ 

We also require an upper bound for their number. There are p 4- 1 of them 
when k = 1; and 

(1 +x, +...+x k ) p 

— (1 +-*1 H — • +*k-l) p + (1 +*i 4 \-Xk-\) p ~^Xk H \-x p , 

so that the number is multiplied at most by p+ 1 when we pass from k — 1 
to k. Hence the number of the a does not exceed (p + l)*.t 
We now form the corresponding power 

F p = {1 + e($\t — aiH h eOV - <*k)} p 


of F. This is a sum of the form (23.9.1), obtained by replacing x r in (23.9.7) 
by e(p r t — a r ). When we do this, every product jc” 1 . . . x n k k in (23.9.7) will 
give rise to a different c v , since the equality of two c v would imply a linear 


t 


The actual number is 
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relation between the d It follows that every coefficient b v has an absolute 
value equal to the corresponding coefficient a, and that 


£i* v i = 2> = (*+d'’. 


Suppose now that, in contradiction to (23.9.5), 


(23.9.9) 


lim <f>(t) < k + 1. 


Then there is a X and a to such that, for t > to, 


|F(0KX<*+1, 


and 


Hence 


\b v \ = 


1 I 

J \F(t)fd, « lim J I X p dt = X.P. 


T l 

limi J [F (t)} p e~ Cvit dt ^ J \F(t)\ p dt ^ k p - 


and therefore a ^ k p for every a. Hence, since there are at most (p + 1)* 
of the a, we deduce 


(23.9.10) (htT * (P + U *' 

But X < k + l, and so 



t It is here only that we use the linear independence of the and this is naturally the kernel of the 

proof. 
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where 8 > 0. Thus 


e Sp ^ (p+D k , 

which is impossible for large p because 

e~ Sp (p+l) k -+ 0 

when p —> oo. Hence (23.9.9) involves a contradiction for large p, and this 
proves the theorem. 

23.10. Uniform distribution. Kronecker’s theorem, important as it is, 
does not tell the full truth about the sets of points (m?) or • ■ • 

with which it is concerned. These sets are not merely dense in the unit 
interval, or cube, but ‘uniformly distributed’. 

Returning for the moment to one dimension, we say that a set of points 
P n in (0,1) is uniformly distributed if, roughly, every sub-interval of (0,1) 
contains its proper quota of points. To put the definition precisely, we 
suppose that / is a sub-interval of (0, 1), and use I both for the interval and 
for its length. If /»/ is the number of the points Pi , P 2 , . . ., P n which fall in 
/, and 

(23.10.1) — -►/, 

n 

whatever /, when n — ► 00 , then the set is uniformly distributed. We can 
also write (23.10. 1) in either of the forms 

(23.10.2) m ~ nl, ni = nl + o(n). 

Theorem 445. If & is irrational then the points (n§) are uniformly 
distributed in (0, 1 ). 

Let 0 < € < -flj. By Theorem 439, we can choose j so that 0 < (j&) = 
8 < e. We write K = [ 1 /<$]. If 0 ^ h < K, the interval Ih is that in which 

(#0) <jc<({A-M}y0). 

Here Ik extends beyond the point 1 and we are using the circular representa- 
tion of § 23.2 (iii). We denote by r)h(n ) the number of (#), (2$), . . . , (n$), 
which lie in 4 . If (t&) lies in Io, where t is a positive integer, then ({t+hj}$) 
lies in //, and conversely. Hence, if n > hj. 


rih(n) - r)h(hj) = 770 (n - hj). 
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min) - hj < r) h (n) ^ Tj 0 (n) + hj 
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and so 
(23.10.3) 
Now 


lim = 1 (0 ^ h ^ K). 

»~>°° T]o(n) 


K - 1 K 

^2 Wt(n) ^ n < ^*(") 

A=0 A=0 


and we deduce from (23. 10.3) that 
(23.10.4) 


1 . .. min) — rjo(n) 1 

— — ^ hm < lim ^ -. 

K + 1 n->oo n n—KX> n K 


If / is the interval (a, ft) and P — a ^ €, there are integers u, k such that 
0 ^ (uji» ^ a ^ ({« + ljytf ) ^ ({ u + k)j$) ^ 0 < ({u + k + l}ytf), 
so that 

u+k— 1 u+k 

min) ^ ni ^ ^hin). 

h—u -\- 1 /l=M 

Hence, by (23.10.3), we have 

k — 1 < lim ^ lim ^ k + 1 


n—>oo min) n-*-oo t]$(ji) 


and so, using (23.10.4), 
k — 1 


But 


Hence 


. m — ni k + 1 

— «1B 7 <l.m- < — 


AT<S < 1 < (A - + 1)6, (*-!)«</<(*+ 1)3. 


1 — 28 ni — ni ^1 + 28 

— — ^ lim — < lim — ^ . 

1+8 — n n 1—5 

Since we can choose e (and so 8) as small as we please, (23.10.1) follows. 
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The definition of uniform distribution may be extended at once to space 
of k dimensions, and Kronecker’s general theorem may be sharpened in 
the same way. But the proof is more complicated. 

It is natural to inquire what happens in the exceptional cases when the 
are connected by one or more linear relations. Suppose, to fix our ideas, 
that k = 3. If there is one relation, the points P„ are limited to certain 
planes, as they were limited to certain lines in § 23.4; if there are two, they 
are limited to lines. Analogy suggests that the distribution on these planes 
or lines should be dense, and indeed uniform; and it can be proved that this 
is so, and that the corresponding theorems in space of k dimensions are 
also true. 


NOTES 

§ 23. 1 . Kronecker first stated and proved his theorem in the Berliner Sitzungs berichte, 
1884 [fVerke, iii (i), 47-1 10]. For a fuller account and a bibliography of later work inspired 
by the theorem, see Cassels, Diophantine approximation. The one-dimensional theorem 
seems to be due to Tchebychef: see Koksma, 76. 

§ 23.2. For proof (iii) see Hardy and Littlewood, Acta Math. 37 (1914), 155-91, 
especially 161-2. 

§ 23.3. Konig and SzSucs, Rendiconti del circolo matematico di Palermo, 36 (1913), 
79-90. 

§ 23.7. Lettenmeyer, Proc. London Math. Soc. (2), 21 (1923), 306-14. 

§ 23.8. Estermann, Journal London Math. Soc. 8 (1933), 18-20. 

§ 23.9. H. Bohr, Journal London Math. Soc. 9 (1934), 5-6; for a variation see Proc. 
LondonMath. Soc. (2)21 (1923), 315-16. There is another simple proofby Bohr and Jessen 
in Journal London Math. Soc. 7 (1932), 274-5. 

§23.10. Theorem 445 seems to have been found independently, at about the same time, 
by Bohl, Sierpinski, and Weyl. See Koksma, 92. The particular form of the proof given was 
suggested by Dr. Miclavc {Proc. American Math. Soc. 39 (1973), 279-80). 

The best proof of the theorem is no doubt that given by Weyl in a very important paper in 
Math. Annalen, 77 (1916), 313—52. Weyl proves that a necessary and sufficient condition 
for the uniform distribution of the numbers 

(/(D), (/( 2)), (/( 3)), ... 

in (0, 1) is that 

n 

^2,e[hf{v)} = o{n) 

V=1 

for eveiy integral h. This principle has many important applications, particularly to the 
problems mentioned at the end of the chapter. 

For a detailed account of the subject of uniform distribution, see Kuipers and 
Niederreiter. 
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GEOMETRY OF NUMBERS 


24.1. Introduction and restatement of the fundamental theorem. 

This chapter is an introduction to the ‘geometry of numbers’, the sub- 
ject created by Minkowski on the basis of his fundamental Theorem 37 
and its generalization in space of n dimensions. 

We shall need the ^-dimensional generalizations of the notions which 
we used in §§ 3 .9-1 1 ; but these, as we said in § 3 . 1 1 , are straightforward. 
We define a lattice, and equivalence of lattices, as in § 3.5, parallelograms 
being replaced by n-dimensional parallelepipeds; and a convex region as 
in the first definition of § 3.9.^ Minkowski’s theorem is then 

Theorem 446. Any convex region in n-dimensional space, symmetrical 
about the origin and of volume greater than 2 n , contains a point with 
integral coordinates, not all zero. 

Any of the proofs of Theorem 37 in Ch. Ill may be adapted to prove 
Theorem 446: we take, for example, Mordell’s. The planes 

x r = 2 p r /t (r = 1,2, ... ,n) 

divide space into cubes of volume (2 /t) n . If N(t) is the number of comers 
of these cubes in the region R under consideration, and V the volume of R, 
then 


(2 /t) n N(t) -► V 

when t -► oo; and N(t) > t n if V > 2 n and t is sufficiently large. The 
proof may then be completed as before. 

If £i > • - • > £« are linear forms in xi , X 2 , . . . , x n , say 


(24. 1.1) + oc r ,2X2 H f a r ,nX n (r = 1, 2, . . . , »), 

with real coefficients and determinant 


(24.1.2) 


A = 


“1,1 

<*1,2 • • 

• <*1 ,n 

0C n , 1 

<*n,2 • • 

• a nji 


T 6 


' The second definition can also be adapted to n dimensions, the line l becoming an (n— 1)- 
dimensional 4 plane’ (whereas the line of the first definition remains a 4 line’). We shall use 
three-dimensional language: thus we shall call the region Lxj | < 1, \X 2 \ < 1, . . . , \x n \ < 1 the "unit 
cube’. 
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then the points in £ -space corresponding to integral x\,X 2 , . . . ,x„ form a 
lattice A* : we call A the determinant of the lattice. A region R of x-space 
is transformed into a region P of £ -space, and a convex R into a convex P* 
Also 


//-/ dHxdH 2 ■ • • dt- n = | A| II I dx\dx2 ■ • • dx n . 


so that the volume of P is | A| times that of R. We can therefore restate 
Theorem 446 in the form 


Theorem 447. If A is a lattice of determinant A, and P is a convex region 
symmetrical about O and of volume greater than 2"|A|, then P contains a 
point of A other than O. 

We assume throughout the chapter that A ^0. 

24.2. Simple applications. The theorems which follow will all have 
the same character. We shall be given a system of forms £ r , usually linear 
and homogeneous, but sometimes (as in Theorem 455) non-homogeneous, 
and we shall prove that there are integral values of the x r (usually not all 0) 
for which the satisfy certain inequalities. We can obtain such theorems 
at once by applying Theorem 447 to various simple regions P. 

(1) Suppose first that P is the region defined by 


l£i I < *1, \%i\ < * 2 ,..., |£«l < A.„. 

This is convex and symmetrical about O, and its volume is 2 n k\k2 ■ ■ . k n . If 
k\k 2 - ■ -k n > |A|, P contains a lattice point other than O; if X 1 X 2 ... 
k„ ^ | A I, there is a lattice point, other than O, inside P or on its boundary. 'I 
We thus obtain 

Theorem 448. If £i,& ,•••,&! are homogeneous linear forms in 
x \ , *2, • • • , x n , with real coefficients and determinant A, and k \ , X 2 , . . . , k n 

' In § 3.5 we used L for a lattice of lines, A for the corresponding point-lattice. It is more convenient 
now to reserve Greek letters for configurations in -space’. 

* The invariance of convexity depends on two properties of linear transformations viz. (1) that lines 
and planes are transformed into lines and planes, and (2) that the order of points on a line is unaltered. 

N We pass here, by an appeal to continuity, from a result concerning an open region to one concerning 
the corresponding closed region. We might, of course, make a similar change in the genera) theorems 
446 and 447: thus any closed convex region, symmetrical about O, and of volume not less than 2 n t 
has a lattice point, other than O, inside it or on its boundary. We shall not again refer explicitly to such 
trivial appeals to continuity. 
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(24.2.1) *1*2 ... > |A|, 

then there are integers x\,X 2 , . . . ,x„, not all 0, for which 

(24.2.2) \h\<Xu |£*KV 

In particular we can make |£i | ^ y\A\ for each r. 

(2) Secondly, suppose that P is defined by 

(24.2.3) IM + lfel + --- + ll»l <X. 

If n = 2, P is a square; if n = 3 , an octahedron. In the general case it consists 
of 2" congruent parts, one in each ‘octant’. It is obviously symmetrical 
about O, and it is convex because 

1 * 4 +//ri ^isi+m'is'i 

for positive p and p'. The volume in the positive octant > 0 is 

1 l-fl l-£i f„_i 

*■/«./«.-/ «*. = f 
0 0 0 

If A.” > w! | A | then the volume of P exceeds 2”|A|, and there is a lattice 
point, besides O, in P. Hence we obtain 

Theorem 449. There are integers x\,X 2 , . . . ,x n , not all 0, for which 

(24.2.4) |M + Ifel + ••• + 16,1 < (nllAI) 1 ^. 

Since, by the theorem of the arithmetic and geometric means, 

■Illl2...{»| ,/ " « III I + |fe| + • • • + ll„l, 

we have also 


Theorem 450. There are integers x\ , X 2 , . . . , x„, not all 0, for which 
(24.2.5) l$i$2...$„l < rr"n\ |A|. 
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(3) As a third application, we define P by 

%\ + £2 H + Hn < ^ '• 

this region is convex because 

(/i£ + ^ (p + p')(pt; 2 + p'%' 2 ) 

for positive p and p r . The volume of P is W n , where* 

//.../ - r( *; +l) . 

Hence we obtain 

Theorem 45 1 . There are integers x\,xi, . . . ,x n , not all 0 ,for which 

(24.2.6) $? 4- fe 2 + • ■ ■ + £„ 2 < 4 / 

Theorem 451 may be expressed in a different way. A quadratic form Q 
inxi,X 2 , . . . ,x„ is a function 

n n 

Q(xi,X2,...,x„) = EEw 
r= 1 s=l 

with a s r = a r>5 . The determinant D of Q is the determinant of its coeffi- 
cients. If Q > 0 for allxi,X 2 , . . . ,x„, not all 0, then Q is said to be positive 
definite. It is familiar* that Q can then be expressed in the form 

Q = £? + £f ■+ •"$«> 

where £1 , Hi, . . • , £„ are linear forms with real coefficients and determinant 
y/D. Hence Theorem 45 1 may be restated as 

Theorem 452. IfQ is a positive definite quadratic form in x\,xi,... ,x n , 
with determinant D, then there are integral values of x\,xi, . . . ,x n , not all 
0, for which 

(24.2.7) Q ^ 4 D l/n J~ 2/n . 

t See, for example, Whittaker and Watson, Modem analysis , ed. 3 (1920), 258. For n = 2 and 
n = 3 we get the values ttX 2 and |;r A. 3 for the volumes of a circle or a sphere. 

J See, for example, Bochcr, Introduction to higher algebra , ch. 10, or Ferrar, Algebra, ch. 11. 
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24.3. Arithmetical proof of Theorem 448. There are various proofs 
of Theorem 448 which do not depend on Theorem 446, and the great 
importance of the theorem makes it desirable to give one here. We confine 
ourselves for simplicity to the case n = 2. Thus we are given linear forms 


(24.3.1) %=ax + Py, q = yx + 8y, 


with real coefficients and determinant A = a8 — py ^ 0, and positive 
numbers A, p for which Xp ^ | A | ; and we have to prove that 


(24.3.2) |*KA, I’ll < /x, 

for some integral x and y not both 0. We may plainly suppose A > 0. 

We prove the theorem in three stages: (1) when the coefficients are inte- 
gral and each of the pairs a , P and y, 8 is coprime; (2) when the coefficients 
are rational; and (3) in the general case. 

(1) We suppose first that a, fi, y, and 8 are integers and that 

(a,p) = (y,8) = 1. 

Since (a, ft) — 1, there are integers p and q for which aq — ftp = 1. The 
linear transformation 


ax + Py = X, px + qy = Y 

establishes a (1, 1) correlation between integral pairs x,y andX, Y; and 

H=X, q = rX + AY, 

where r = yq — 8p is an integer. It is sufficient to prove that |£| ^ X and 
|?7l ^ p for some integral X and Y not both 0. 

If A < 1 then p ^ A, and X = 0, Y = 1 gives * = 0, M = A ^ p. 

If A > 1, we take 

n = [A], * = -£, h=Y, k=Xj 

in Theorem 36. Then 


0 < jc ^ [A] ^ A 


t The £ here is naturally not the £ of this section. 
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and 


\rX + AY\ = AX 



Y A 
X ^ n + 1 


A 

M+l 



so that X = k and Y — h satisfy our requirements. 

(2) We suppose next that a, P, y, and S are any rational numbers. Then 
we can choose p and a so that 


£' = />£ = ct'x + p'y, T]' = ar) = y'x + s'y, 

where y', and 8' are integers, (a',p') = 1, ( y',8 ') = 1, and A' = 
a' S' — p'y' = pa A. Also pk . op ^ A', and therefore, after (1), there are 
integers x,y, not both 0, for which 

IS'I ^ PK M ^ cep. 

These inequalities are equivalent to (24.3.2), so that the theorem is proved 
in case (2). 

(3) Finally, we suppose a, ft, y, and 8 unrestricted. If we put a = 
a'y/A, ...,£ = £\/A, ..., then A' = a'8' — P'y ' = 1. If the theo- 
rem has been proved when A = 1, and k' p' ^ 1, then there are integral 
x,y, not both 0, for which 

in ^ n in ^ n'\ 

and these inequalities are equivalent to (24.3.2), with k = k'y/A, p = 
p'y/ A, kp ^ A. We may therefore suppose without loss of generality 
that A = 1 .t 

We can choose a sequence of rational sets a„ , P„,y n , 8„ such that 

&n&n ~ PnYn = 1 

and a„ — > a ,p n —*■ p , . . . , when n — ► oo. It follows from (2) that there 
are integers x„ and y n , not both 0, for which 

(24.3.3) \(X n X n 4" Pnyn\ ^ I YnXn 4" 8 n y n j ^ p. 

Also 


l*nl I &n(&nXn 4“ PriYn ) PniYn^n 4“ ^nYn) I ^ ^|5/j| 4“ P I Pn 1 1 


t A similar appeal to homogeneity would enable us to reduce the proof of any of the theorems of 
this chapter to its proof in the case in which A has any assigned value. 
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so that x„ is bounded; and similarly is bounded. It follows, since x n and 
y„ are integral, that some pair of integers x,y must occur infinitely often 
among the pairs x„,y„. Taking x„ = x,y„ = y in (24.3.3), and making 
n — ► oo, through the appropriate values, we obtain (24.3.2). 

It is important to observe that this method of proof, by reduction to the case of rational 
or integral coefficients, cannot be used for such a theorem as Theorem 450. This (when 
n = 2) asserts that |£i/| < 5 1 A| for appropriate x,y. If we try to use the argument of (3) 
above, it fails because x n and y n are not necessarily bounded. The failure is natural, since 
the theorem is trivial when the coefficients are rational: we can obviously choose x and y 
so that £ = 0, |£»j| = 0 < jIAI. 

24.4. Best possible inequalities. It is easy to see that Theorem 448 is 
the best possible theorem of its kind, in the sense that it becomes false if 

(24.2.1) is replaced by 

(24.4.1) X\X 2 ...X n ^ *|A| 

with any k < 1 . Thus if £ r = x r , for each r, so that A = 1 , and X r = y/k, 
then (24.4.1) is satisfied; but |£ r | ^ A. r < 1 implies x r = 0, and there is no 
solution of (24.2.2) except x\ = jc 2 = • . . = 0. 

It is natural to ask whether Theorems 449-5 1 are similarly ‘best pos- 
sible’. Except in one special case, the answer is negative; the numerical 
constants on the right of (24.2.4), (24.2.5), and (24.2.6) can be replaced by 
smaller numbers. 

The special case referred to is the case n = 2 of Theorem 449. This 
asserts that we can make 

(24.4.2) l$l + tol W(2|A|), 

and it is easy to see that this is the best possible result. If £ = x+y, rj = x~y, 
then A = —2, and (24.4.2) is |£| -I- |t/| < 2. But 

III + M = max(|£ + n\, |£ - rj\) = max(|2x|, |2y|), 

and this cannot be less than 2 unless x = y = 0.* 

Theorem 450 is not a best possible theorem even when n = 2. It then 
asserts that 

(24.4.3) |*)7| ^ i|A|, 

t Actually the case n — 2 of Theorem 449 is equivalent to the corresponding case of Theorem 448. 
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and we shall show in § 24.6 that the \ here may be replaced by the smaller 
constant 5“ i . We shall also make a corresponding improvement in Theorem 
45 1 . This asserts (when n = 2) that 

£ 2 + t? 2 < 4jt _1 |A|, 

and we shall show that 4tt~ 1 = 1-27 . . . may be replaced by (|) 2 = 
115 

We shall also show that 5 “ 2 and (|) 2 are the best possible constants. 
When n > 2, the determination of the best possible constants is difficult. 

24.5. The best possible inequality for ^ + u 2 . If 

Q(x,y) = ax 2 + 2 bxy + cy 2 

is a quadratic form in x and y (with real, but not necessarily integral, 
coefficients); 

x = px' + qy\ y = rx' + sy' (ps - qr = ± 1) 

is a unimodular substitution in the sense of § 3.6; and 

Q(x, y ) = a'x' 2 + 2b'x'y + c'y' 2 = Q'{x\y'), 

then we say that Q is equivalent to Q', and write Q ~ Q' . It is easily 
verified that a! c' — b' 2 = ac — b 2 , so that equivalent forms have the same 
determinant. It is plain that the assertions that \Q\ ^ k for appropriate 
integral x,y, and that \Q'\ ^ k for appropriate integral x',y', are equivalent 
to one another. 

Now let xo,yo be coprime integers such that M = Q(xo,yo) ^ 0. We 
can choose xi,y\ so that xoy\ — x\yo = 1. The transformation 

(24.5. 1 ) x = xox' +x\y', y= yox' + y\y' 

is unimodular and transforms Q(x,y) into Q'ix',/) with 

a' = ax o + 2bxoyo + cy\ = Q(x 0 ,y 0 ) = M. 


If we make the further unimodular transformation 
(24.5.2) x'=x" + ny", y'=y". 
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where n is an integer, a' = M is unchanged and b' becomes 


b" — b'-\- na! = b' + nM . 


Since M ^ 0, we can choose n so that —\M\ < 2 b" ^ \M\. Thus we 
transform Q(x,y ) by unimodular substitutions into 

Q"{x",y") = A ix ,a + 2 b"x"y" + c"y" 2 


with —\M\ < 2b" ^ \M\f 

We can now improve the results of Theorems 450 and 451, for n = 2. 
We take the latter theorem first. 

Theorem 453. There are integers x, y, not both 0 ,for which 
(24.5.3) £ 2 + 77 2 ^(f)2|A|; 


and this is true with inequality unless 

(24.5.4) £ 2 + T} 2 ~ (|) 5 |A|(* 2 +xy +y 2 ). 

We have 


(24.5.5) 
where 

(24.5.6) 


£ 2 + ri 2 = ax 2 + 2 bxy + cy 2 = Q(x,y), 

a = a 2 + y 2 , b = afi + yS, c = fi 2 + 8 
ac — b 2 — {ah — fty) 2 = A 2 > 0. 


2 


Then Q > 0 except when x = y = 0, and there are at most a finite number 
of integral pairs x,y for which Q is less than any given k. It follows that, 
among such integral pairs, not both 0, there is one, say (xo,yo)» for which 
Q assumes a positive minimum value m. Clearly xo and yo are coprime 
and so, by what we have just said, Q is equivalent to a form Q", with 
a" = m and — m < 2b" < m. Thus (dropping the dashes) we may suppose 
that the form is 


mx 2 + 2 bxy + cy 2 , 

t A reader familiar with the elements of the theory of quadratic forms will recognize Gauss’s method 
for transforming Q into a ‘reduced’ form. 
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where -m < 2b < m. Then c > m, since otherwise x = 0 ,y = 1 would 
give a value less than m; and 

(24.5.7) A 2 = me- b 2 ^m 2 - \m 2 = § m 2 , 

so that m ^ (|) 3 |A| . 

This proves (24.5.3). There can be equality throughout (24.5.7) only if 
c = m and b = \m, in which case Q ~ mix 2 + xy + y 2 ) . For this form the 

minimum is plainly ( 3) 2 |A|. 

24.6. The best possible inequality for |$ty| . Passing to the product 
we prove 

Theorem 454. There are integers x, y,- not both 0, for which 
(24.6.1) |$»/|0-i|A|; 


and this is true with inequality unless 

(24.6.2) $r) ~ 5“2 |A| (x 2 +xy -y 2 ). 

The proof is a little less straightforward than that of Theorem 453 because 
we are concerned with an ‘indefinite form’. We write 


(24.6.3) 
where 

(24.6.4) 


tjr) = ax 2 + 2 bxy + cy 2 = Q(x,y), 


1 a = ay, 2b = aS + fiy, c = fi8, 
\ 4 (b 2 - ac ) = A 2 > 0. 


We write m for the lower bound of \Q{x,y)\, for x andy not both zero; we 
may plainly suppose that m > 0 since there is nothing to prove if m = 0. 
There may now be no pair x,y such that \Q(x,y)\ = m, but there must be 
pairs for which \Q(x,y)\ is as near to m as we please. Hence we can find 
a coprime pair xo andyo so that m ^ \M\ < 2m, where M = £>0to,yo)- 
Without loss of generality we may take M > 0. If we transform as in 
§ 24.5, and drop the dashes, our new quadratic form is 

Q(x,y ) = Mx 2 + 2 bxy + cy 2 . 
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(24.6.5) m ^ M < 2m, -M < 2b < M 
and 

(24.6.6) 4 (b 2 - Me) = A 2 > 0. 

By the definition of m, \Q(x,y)\ ^ m for all integral pairs x,y other 
than 0,0. Hence if, for a particular pair, Q(x,y) < m, it follows that 
Q(x,y ) ^ — m. Now, by (24.6.5) and (24.6.6), 

b 2 , 

Q( 0, 1 ) = c< — ^\M <m. 

M 

Hence c ^ —m and we write C = — c ^ m > 0. Again 

q = M — \2b\ -C ^M -C ^M -m<m 

and so M — \2b\ — C < — m, that is 

(24.6.7) |26| ^tM + m-C. 

IfM + m — C < 0, we have C > M + m ^ 2m and 

A 2 = 4 (b 2 + MC ) ^ 4 MC ^ 8 m 2 > 5m 2 . 

If M + m — C ^ 0, we have from (24.6.7) 

A 2 = 4b 2 + 4 MC > (M + m-C) 2 + 4MC 
= ( M — m + C) 2 + 4 Mm ^ 5m 2 . 

Equality can occur only if M—m+C = m and M — m, so that M = C — m 
and |6| = m. This corresponds to one or other of the two (equivalent) forms 
m(x 2 +xy — y 2 ) and m(x 2 —xy —y 2 ). For these, |0(1,O)| = m = 5~2 A. 
For all other forms, 5 m 2 < A 2 and so we may choose xo,yo so that 

5 m 2 ^ 5 M 2 < A 2 . 


This is Theorem 454. 
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24.7. A theorem concerning non-homogeneous forms. We prove 
next an important theorem of Minkowski concerning non-homogeneous 
forms 

(24.7.1) £ — p = ax + fiy — p, rj — a = yx + 8y — a. 

Theorem 455. If % and rj are homogeneous linear forms in x, y, with 
determinant A ^0, and p and a are real, then there are integral x, y for 
which 

(24.7.2) IG - P)(n -<r)Ki |A| ; 
and this is true with inequality unless 

(24.7.3) 

£ =0u, t) = <{>v, 0<p = A, p = d(f+\), o =<p(g+j), 

where u and v are forms with integral coefficients ( and determinant 1 ), and 
f and g are integers. 

It will be observed that this theorem differs from all which precede in 
that we do not exclude the values x = y = 0. It would be false if we did 
not allow this possibility, for example if £ and q are the special forms of 
Theorem 454 and p — a — 0. 

It will be convenient to restate the theorem in a different form. The 
points in the plane £, tj corresponding to integral x,y form a lattice A of 
determinant A. Two points P, Q are equivalent with respect to A if the 
vector PQ is equal to the vector from the origin to a point of A;* and 
(£ — p, t) — o), with integral x,y, is equivalent to (— p, —a). Hence the 
theorem may be restated as 

Theorem 456. If A is a lattice of determinant A in the plane of (£, tj), 
and Q is any given point of the plane, then there is a point equivalent to Q 
for which 

(24.7.4) ||,| =S \ |A| , 

with inequality except in the special case (24.7.3). 

t See p. 42. It is the same thing to say that the corresponding points in the (x,y) plane are equivalent 
with respect to the fundamental lattice. 
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In what follows we shall be concerned with three sets of variables, (x,y), 
(£, rj), and (£', rf) We call the planes of the last two sets of variables 7 r 
and 7 r'. 

We may suppose A = 1.* By Theorem 450 (and a fortiori by Theorem 
454), there is a point Pq of A, other than the origin, and corresponding to 
x 0 ,y 0 , for which 

(24.7.5) \^ono\ ^ 5. 

We may suppose xo and yo coprime (so that Pq is ‘visible’ in the sense of 
§ 3.6). Since £0 and t)o satisfy (24.7.5), and are not both 0, there is a real 
positive A. for which 

(24.7.6) (X£o ) 2 + (*-%) 2 = 1. 

We put 

(24.7.7) £' = *£, if = 

Then the lattice A in 7T corresponds to a lattice A' in Jt', also of determi- 
nant 1 . If O' and Pq correspond to O and Pq, then P' Q , like Pq, is visible; 
and O'P' 0 = 1 , by (24.7.6). Thus the points of A' on O'P', are spaced out at 
unit distances, and, since the area of the basic parallelogram of A' is 1, the 
other points of A' lie on lines parallel to O' Pq which are at unit distances 
from one another. 

We denote by S' the square whose centre is O' and one of whose sides 
bisects O'P' 0 perpendicularly.* Each side of S' is 1; S' lies in the circle 

S ' 2 + „' 2 = 2 (]) 2 = I , 

and 

(24.7.8) |$V | « ] (I ' 2 + n 12 ) « ] 
at all points of S'. 

If A' and B' are two points inside S', then each component of the vector 
A'B' (measured parallel to the sides of the square) is less than 1, so that A' 
and B' cannot be equivalent with respect to A'. It follows from Theorem 


t See the footnote to p. 528. 

1 The reader should draw a figure. 
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42 that there is a point of S' equivalent to Q' (the point of it' corresponding 
to Q). The corresponding point of n is equivalent to Q, and satisfies 

(24.7.9) l£i)l = |«Y| « l 

This proves the main clause of Theorem 456 (or 455). 

If there is equality in (24.7.9), there must be equality in (24.7.8), so that 
|£'| = |^'| = This is only possible if S' has its sides parallel to the 
coordinate axes and the point of S' in question is at a comer. In this case P' 0 
must be one of the four points (± 1 , 0 ), ( 0 , ± 1 ): let us suppose, for example, 
that it is ( 1 , 0 ). 

The lattice A' can be based on O'P' 0 and 0'P[ , where P[ is on rj' = 1 . We 
may suppose, selecting P\ appropriately, that it is (c, 1 ), where 0 ^ c < 1. 
If the point of S' equivalent to Q' is, say, (j, 5 ) , then (5 — <?, 5 — l) , 
i.e. ( j — c, — 5 ) , is another point equivalent to Q' and this can only be at a 
comer of 5', as it must be, if c = 0. Hence P\ is (0, 1 ), A' is the fundamental 
lattice in Jt', and Q! , being equivalent to (5, 5), has coordinates 

£ /= =/ + 3> 7 / = g+3> 

where / andg are integers. We are thus led to the exceptional case (24.7.3), 
and it is plain that in this case the sign of equality is necessary. 

24.8. Arithmetical proof of Theorem 455. We also give an arithmeti- 
cal proof of the main clause of Theorem 455. We transform it as in Theorem 
456, and we have to show that, giveti 11 and v, we can satisfy (24.7.4) with 
an x and ay congruent to tx and v to modulus 1 . 

We again suppose A = 1. As in § 24.7, there are integers xo,yo, which 
we may suppose coprime, for which 

l(<**o + Pyo)(yxo + <5y 0 )l ^ 5 . 

We choose x\ andyi so thatxoyi — x\yo = 1. The transformation 
x = x 0 x' + x \ /, y = yox' + y\y' 

changes £ and rj into forms a'x' + fi'y', rj' = y'x' + S'y' for which 
l«Vl = l(<**o + Pyo)(yxo + 5yo)l ^ 



24.8 (457)] 


GEOMETRY OF NUMBERS 


537 


Hence, reverting to our original notation, we may suppose without loss of 
generality that 

(24.8.1) \ay\ < J. 

It follows from (24.8.1) that there is a real X for which 

X 2 a 2 + k~ 2 y 2 = 1; 


and 

2 1 (ax + Py)(yx + 5>>)| ^ X 2 (ax + Py) 2 + X _2 (yx + Sy ) 2 

= x 2 + 2bxy + cy 2 = (x + by) 2 + py 2 , 

for some b, c,p. The determinant of this quadratic form is, on the one hand, 
the square of that of A. (ax + Py) and X - 1 (yx + that is to say 1 , and on 

the other the square of that of x+ by and p?y, that is to say p\ and therefore 
p = 1. Thus 


2 1 (ax + Py)(yx + <$y)| < (x + by) 2 + y 2 . 

We can choose y = v (mod 1) so that \y\ < and then x = p, (mod 1) so 
that |x + by\ ^ and then 

K»l<i{(i ) 2 + (i) 2 } = i- 

We leave it to the reader to discriminate the cases of equality in this 
alternative proof. 

24.9. Tchebotaref’s theorem. It has been conjectured that Theorem 
455 could be extended to n dimensions, with 2~ n in place of but this 
has been proved only for n = 3 and n = 4. There is, however, a theorem 
of Tchebotaref which goes some way in this direction. 

Theorem 457. If £i, £2, • • • , are homogeneous linear forms in 
xi,X 2 , . . . ,x n , with real coefficients and determinant A; p\, P 2 , . . . , p n are 
real; and m is the lower bound of 

l(£l-Pl)(S2-P2)...(£„-/0 w )|, 


f Sec (24.5.5) and (24.5.6). 
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then 

(24.9.1) 2-2" |A|. 

We may suppose A = 1 and m > 0. Then, given any positive e, there 
are integers x*,x^, ■ ■ ■ ,x* for which 

(24.9.2) 

n Iff - A'l = l(fi* - Pi) te* -«)••• fe* - P»)l = 0 < 9 < «■ 

We put 



Hi - If 

% - Pi 


(i = 1,2, — ,/i). 


Then£(,. . . are linear forms in jq —x*,... ,x n —x*, with a determinant 

D whose absolute value is 


m = (n it * - p .- i ) 


1 -6 

m 


and the points in § '-space corresponding to integral x form a lattice A' 
whose determinant is of absolute value (1 — 6)/m. Since 


n i ^ -pt\> 


every point of A' satisfies 


ni «?+* i-n 


& - Pi 
- Pi 


> 1 -9. 


The same inequality is satisfied by the point symmetrical about the origin, 
so that n \%i ~ 1 1 ^ 1 — 0 and 


(24.9.3) n If/ 2 - >1 = Kf? - 1) (f 2 - 1) • (tf - 01 > <1 - e > 2 - 


We now prove that when € and 0 are small, there is no point of A', other 
than the origin, in the cube C' defined by 

l?;i < vo+o -e) 2 )- 


(24.9.4) 
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If there is such a point, it satisfies 

(24.9.5) -1 <f/ 2 - 1 < (1 -G) 2 < 1 (i= 1,2,..., /i). 

If 

(24.9.6) f/ 2 ~ 1 > -(1 ~ 0) 2 

for some then If/ 2 - 1 1 < (1 - 0) 2 for that i, and |f/ 2 - 1 1 < 1 for every 
i, so that 

in contradiction to (24.9.3). Hence (24.9.6) is impossible, and therefore 
-1 < f/ 2 - 1 ^ -(1 - 0) 2 (i = 1,2, . . . ,n); 

and hence 

(24.9.7) |f/|^V|l-a-^) 2 }^V(2^) (r= 1,2,. ..,»). 

Thus every point of A' in C' is very near to the origin when e and 6 are 
small. 

But this leads at once to a contradiction. For if (f j, . . . ,f/) is a point 
of A', then so is (A r f{, . . . ,N%' n ) for every integral N. If 6 is small, every 
coordinate of a lattice point in C' satisfies (24.9.7), and at least one of them 
is not 0, then plainly we can choose N so that (A r f{, . . . ,7/f'), while still 
in C', is at a distance at least j from the origin, and therefore cannot satisfy 

(24.9.7) . The contradiction shows that, as we stated, there is no point of 
A', except the origin, in C'. 

It is now easy to complete the proof of Theorem 457. Since there is no 
point of A', except the origin, in C', it follows from Theorem 447 that the 
volume of C' does not exceed 

2 " |£>| = 2”(1 - d)/m; 


and therefore that 

2 n m {1 + (1 - G) 2 y n < 2”(1 - G). 
Dividing by 2 ", and making 0 — ► 0, we obtain 

m ^ 2“H 


the result of the theorem. 
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24.10. A converse of Minkowski’s Theorem 446. There is a partial 
converse of Theorem 446, which we shall prove for the case n = 2. 
The result is not confined to convex regions and we therefore first redefine 
the area of a bounded region P, since the definition of §3.9 may no longer 
be applicable. 

For every p > 0, we denote by A (p) the lattice of points (px, py), where 
x,y take all integral values, and write g(p) for the number of points of A (p) 
(apart from the origin O ) which belong to the bounded region P. We call 

(24.10.1) V = lim p 2 g{p) 

p ~>0 

the area of P, if the limit exists. This definition embodies the only prop- 
erty of area which we require in what follows. It is clearly equivalent to 
any natural definition of area for elementary regions such as polygons, 
ellipses, etc. 

We prove first 

Theorem 458. If P is a bounded plane region with an area V which is 
less than 1, there is a lattice of determinant 1 which has no point ( except 
perhaps O ) belonging to P. 

Since P is bounded, there is a number N such that 

(24.10.2) -N ^ ^ AT, -N 

for every point (£, rj) of P. Let p be any prime such that 

(24.10.3) p > N 2 . 

Let u be any integer and A u the lattice of points (£, tj) where 
„ X uX+pY 

Vp' " “ -Jp 

and X, Y take all integral values. The determinant of A M is 1. If Theorem 
458 is false, there is a point T u belonging to both A u and P and not coinciding 
with O. Let the coordinates of T u be 

t X u ^ uX u -+* pY u 

bU / 5 f]u — / 

VP y/p 

If X u = 0, we have 

y/p\Y u \ = \Vu\ < y/P 
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by (24.10.2) and (24.10.3). It follows that Y u = 0 and T u is O, contrary to 
our hypothesis. Hence X u ^ 0 and 

0<\X u \ = Jp\Su\^Njp<p. 


Thus 


(24.10.4) X u ^ 0 (mod p). 

If T u and T v coincide, we have 

X u = X v , uX u + pY u = vX v 4- pY v 

and so 


X u (u — v) = 0, u = v (mod p) 


by (24.10.4). Hence the p points 

(24.10.5) To,TuT 2 ,...,T p -i 

are all different. Since they all belong to P and to A (p - z ^ , it follows that 

s(p~*) > P • 

But this is false for large enough p , since 

p-'g(p-i)^r< i 

by (24. 10. 1). Hence Theorem 458 is true. 

For our next result we require the idea of visible points of a lattice 
introduced in Ch. III. A point T of A(p) is visible (i.e. visible from the 
origin) if T is not O and if there is no point of A(p) on OT between O and 
T. We write f(p) for the number of visible points of A(p) belonging to P 
and prove the following lemma. 


p 2 Hp) 


f(2) 


as 


0 . 


Theorem 459: 
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The number of points of A (p) other than O, whose coordinates satisfy 
(24.10.2) is 

( 2lN/p]+lf-l. 

Hence 

(24.10.6) f(p) = g(p) = 0 (p > N) 
and 

(24.10.7) f(p) < g(p) < 9N 2 /p 2 
for all p. 

Clearly (px, py) is a visible point of A (p) if, and only if, x,y are coprime. 
More generally, if m is the highest common factor of x and y, the point 
(px, py) is a visible point of A (mp) but not of A (kp) for any integral 
k / m. Hence 


oo 

g(p ) = 

m— 1 


By Theorem 270, it follows that 


OO 

f(p) = P'( m ')g( m P'> • 

m— 1 

The convergence condition of that theorem is satisfied trivially since, by 
(24.10.6),/ (mp) = g(mp) = 0 for mp > N. Again, by Theorem 287, 


1 

7 ( 2 ) 


oo 


-E 


fx(m) 


m= 1 


m* 


and so 


(24.10.8) p 2 f(p) ~ ~rr = ^ {m 2 p 2 g(mp) - V] . 

C(2) “ m 2 

m= l 

Now let e > 0. By (24. 10. 1), there is a number p\ = p\ (e) such that 

\m 2 p 2 g(mp) - V\ <€ 
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\m 2 p 2 g(mp) — V\ < 9N 2 + V 
for all m. If we write M = [p\/p], we have, by (24.10.8), 

V 


P 2 f(p ) - 


f(2) 


M 1 _ oo 1 

«E3 + (»" ! + 1 ') E r? 

m= 1 w m=A/+l ^ 

«r 2 9N 2 + V , 

< — + — — < 3€, 


6 M + 1 

if /o is small enough to make 

Af = [pi/p] > (9N 2 + F)/€. 


Since € is arbitrary, Theorem 459 follows at once. 

We can now show that the condition V < 1 of Theorem 458 can be 
relaxed if we confine our result to regions of a certain special form. We say 
that the bounded region P is a star region provided that (i) O belongs to P, 
(ii) P has an area V defined by (24. 1 0. 1 ), and (iii) if T is any point of P, then 
so is every point of OT between O and T. Every convex region containing 
O is a star region; but there are star regions which are not convex. We can 
now prove 

Theorem 1. If ¥ is a star region, symmetrical about O and of area 
V < 2£(2) = jtt 2 there is a lattice of determinant 1 which has no point 
( except O) in P. 

We use the same notation and argument as in the proof of Theorem 458. 
If Theorem 460 is false, there is a T u , different from O, belonging to A u 
and to P. 

If T u is not a visible point of A(p~?), we have m > 1, where m is the 
highest common factor ofX u and uX u +pY u . By (24.10.4),/? \ X u and so 
p\ m. Hence m\ Y u . If we write X u = mX' u , Y u = mY' u , the numbers X' u and 
uX' u +pY{i are coprime. Thus the point T' u , whose coordinates are 

K uXi+pYi 
Jp VP ’ 

belongs to A u and is a visible point of A (p~i). But T' u lies on OT u and so 
belongs to the star region P. Hence, if T u is not visible, we may replace it 
by a visible point. 
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Now P contains the p points 

(24.10.9) T 0 , 7*1,..., 7>_i, 

all visible points of AO? - *), all different (as before) and none coinciding 
with O. Since P is symmetrical about O, P also contains the p points 

(24.10.10) To,T u ...,T p -u 

where T u is the point (— t- u , —rj u ). All these p points are visible points of 
A (/?“*), all are different and none is O. Now T u and T u cannot coincide 
(for then each would be O ). Again, if u ^ v and T u and T v coincide, we 
have 


X u = —X v , uX u + pY u = —vX v —pY v , 

(u — v)X u = 0, X u = 0 or u = v (mod p), 

both impossible. Hence the 2 p points listed in (24.10.9) and (24.10.10) are 
all different, all visible points of A(p - * ) and all belong to P so that 

(24.10.11) f(p~*)^2p- 

But, by Theorem 459, as p — >• oo, 

-► 6V/jz 2 < 2 

by hypothesis, and so (24.10.1 1) is false for large enough p. Theorem 460 
follows. 

The above proofs of Theorems 458 and 460 extend at once to n 
dimensions. In Theorem 460, £(2) is replaced by £(n). 


NOTES 

§ 24.1. Minkowski’s writings on the geometry of numbers are contained in his books 
Geometrie der Zahlen and Diophantische Approximationen, already referred to in the note 
on § 3.10, and in a number of papers reprinted in his Gesammelte Abhandlungen (Leipzig, 
1911). The fundamental theorem was first stated and proved in a paper of 1 89 1 ( Gesammelte 
Abhandlungen , i. 265). There is a very full account of the history and bibliography of the 
subject, up to 1936, in Koksma, chs. 2 and 3, and a survey of later progress by Davenport 
in Proc. International Congress Math. (Cambridge, Mass., 1950), 1 (1952), 166-74. More 
recent accounts of the whole subject are given by Cassels, Geometry of numbers, ; Gruber 
and Lekkerkerker, Geometry of Numbers (North Holland, Amsterdam, 1987); and Erd6s, 
Gruber, and Hammer, Lattice points (Longman Scientific, Harlow, 1989). 
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Siegel [Acta Math. 65 (1935), 307-23] has shown that if V is the volume of a convex 
and symmetrical region R containing no lattice point but O, then 

2" = V + V~ x £ |/| 2 , 

where each / is a multiple integral over R . This formula makes Minkowski’s theorem 
evident. 

Minkowski ( Geometrie der Zahlen, 211-19) proved a further theorem which includes 
and goes beyond the fundamental theorem. We suppose R convex and symmetrical, and 
write kR for R magnified linearly about O by a factor k. We define k \ , k 2 , . . . , k n as follows: 
k\ is the least k for which kR has a lattice point Pi on its boundary; k 2 the least for which 
kR has a lattice point P 2 , not collinear with O and Pi, on its boundary; k 2 the least for 
which kR has a lattice point P 3 , not coplanar with O , P \ , and P 2 , on its boundary; and so 
on. Then 


0 < < k 2 ^ ^ kn 

(k 2 , for example, being equal to k\ if k\R has a second lattice point, not collinear with O 
and Pi , on its boundary); and 


k x k 2 ...k n V <2". . 

The fundamental theorem is equivalent to k^V ^ 2 n . Davenport [Quarterly Journal of 
Math. (Oxford), 10 (1939), 117-21] has given a short proof of the more general theorem. 
See also Bambah, Woods, and Zassenhaus {J. Australian Math. Soc. 5 (1965), 453-62) and 
Henk {Rend. Circ. Mat. Palermo (II) Vol 1, Suppl.70 (2002) 377-84). 

§ 24.2. All these applications of the fundamental theorem were made by Minkowski. 
Siegel, Math. Annalen, 87 (1922), 36-8, gave an analytic proof of Theorem 448: see 
also Mordell, ibid. 103 (1930), 3S-47. 

Hajos, Math. Zeitschrift, 47 (1941), 427-67, has proved an interesting conjecture of 
Minkowski concerning the ‘boundary case’ of Theorem 448. Suppose that A = 1, so that 
there are integral x\ , * 2 , • • • > *n such that |£ r I < 1 for r = 1 , 2, . . . , n. Can the x r be chosen 
so that |§ r | < 1 for every r? Minkowski’s conjecture, now established by Haj 6 s, was that 
this is true except when the £ r can be reduced, by a change of order and a unimodular 
substitution, to the forms 

£l=*l, £2 = a 2,l*l *2> •••, — a n,\ x \ + a n2 x 2 H +*n- 

The conjecture had been proved before only for n ^ 7. 

The first general results concerning the minima of definite quadratic forms were found 
by Hermite in 1847 ( CEuvres , i, 100 et seq.): these are not quite so sharp as Minkowski’s. 

§ 24.3. The first proof of this character was found by Hurwitz, Gottinger Nachrichten 
(1897), 139-45, and is reproduced in Landau, Algebraische Zahlen, 34—40. The proof was 
afterwards simplified by Weber and Wellstein, Math. Annalen, 73 (1912), 275-85, Mordell, 
Journal London Math. Soc. 8 (1933), 179-82, and Rado, ibid. 9 (1934), 164—5 and 10 
(1933), 115. The proof given here is substantially Rado’s (reduced to two dimensions). 

§ 24.5. Theorem 453 is in Gauss, D.4., § 171 . The corresponding results for forms in n 
variables are known only for n ^ 8 : see Koksma, 24, and Mordell, Journal London Math. 
Soc. 19 (1944), 3-6. 

§ 24.6. Theorem 454 was first proved by Korkine and ZolotarefF, Math. Annalen 6 
(1873), 366-89 (369). Our proof is due to Professor Davenport. See Macbeath, Journal 
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London Math . Soc . 22 (1947), 261-2, for another simple proof. There is a close connexion 
between Theorems 193 and 454. 

Theorem 454 is the first of a series of theorems, due mainly to Markoff, of which there 
is a systematic account in Dickson, Studies, ch. 7. If is not equivalent either to the form 
in (24.6.2) or to 


(a) 


8"5 \A\(x 2 +2xy-y 2 y 


then j 

l*i?l <8“2|A| 

for appropriate x>y\ if it is not equivalent either to the form in (24.6.2), to (a), or to 

(ft) (221)-2|A|(5x 2 + llxy-5>; 2 ), 

then 


|$*| <5(221)-i|A|; 

and so on. The numbers on the right of these inequalities are 
(c) m (9 m 2 — 4^ 2 , 

where m is one of the ‘Markoff numbers’ 1, 2, 5, 13, 29,. . .; and the numbers (c) have 
the limit 3 . See also Cassels, Diophantine approximation, ch. 2 for an alternative proof of 
these theorems. 

There is a similar set of theorems associated with rational approximations to an irrational 
of which the simplest is Theorem 193: see §§ 11.8-10, and Koksma, 31-33. 

Davenport [Proc. London Math . Soc. (2) 44 ( 1 938), 4 1 2-3 1 , and Journal London Math. 
Soc. 16 (1941), 98-101] has solved the corresponding problem for n = 3. We can make 

ififc&i < 7 i a i 

unless 

$1$2$3 ~ 7 n (*1 +dx2+d 2 X3), 

where the product extends over the roots 0 of + 0 2 — 20 — 1 =0. Mordell, in Journal 
London Math. Soc. 17 (1942), 107-15, and a series of subsequent papers in the Journal 
and Proceedings, has obtained the best possible inequality for the minimum of a general 
binary cubic form with given determinant, and has shown how Davenport’s result can be 
deduced from it; and this has been the starting-point for a considerable body of work, by 
Mordell, Mahler, and Davenport, on lattice points in non-convex regions. 

The corresponding problem for n > 3 has not yet been solved. 

Minkowski [Gottinger Nachrichten (1904), 311-35; Gesammelte Abhandlungen, ii. 
3-42] found the best possible result for |£i I + |fel + l&L viz. 

Ifil + lfel + lfeK (W l A l)*- 

No simple proof of this result is known, nor any corresponding result with n > 3. 
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An alternative formulation of Theorem 454 states that if Q(x,y) is an indefinite quadratic 
form of determinant Z), then there are integer values *o>.yo> no * b°th zero * f° r which 
\Q(xo,yo)\ < 2y/\D\ /5. It is natural to ask what happens for quadratic forms in more 
than 2 variables. It was conjectured by Oppenheim in 1 929 that if Q is an indefinite form 
in n ^ 3 variables, and not proportional to an integral form, then Q{x j,. . .,x n ) attains 
arbitrarily small values at integral arguments x \ , . . . , x n not all zero. This was proved by 
Margulis, ( Dynamical systems and ergodic theory (Warsaw, 1986), 399-409). 

§§ 24.7-8. Minkowski proved Theorem 455 in Math. Annalen , 54 (1901), 91-124 
(Gesammelte Abhandlungen , i. 320-56, and Diophantische Approximationen, 42-7). The 
proof in § 24.7 is due to Heilbronn and that in § 24.8 to Landau, Journal fur Math. 165 
(1931), 1-3: the two proofs, though very different in form, are based on the same idea. 
Davenport [Acta Math. 80 (1948), 65-95] solved the corresponding problem for indefinite 
ternary quadratic forms. 

§ 24.9. The conjecture mentioned at the beginning of this section is usually attributed 
to Minkowski, but Dyson [Annals of Math. 49 (1948), 82-109] remarks that he can find 
no reference to it in Minkowski’s published work. The statement is easy to prove when the 
coefficients of the forms are rational. Remak [Math. Zeitschrift, 17 (1923), 1-34 and 18 
(1923), 173-200] proved the truth of the conjecture for n = 3, Dyson [loc. cit .] for n = 4. 
Davenport [Journal London Math. Soc. 14 (1939), 47-51] gave a much shorter proof for 
n = 3. 

The Remak-Davenport-Dyson approach depends on the observation that Minkowski’s 
conjecture follows from the following two conjectures. 

Conjecture I : For each lattice L in n-dimensional Euclidean space, there is an ellipsoid 
of the form 


a\x i H h a„xl < 1 


which contains n linearly independent points of L on its boundary and has no point of L in 
its interior other than O. 

Conjecture II : LetL be a lattice of determinant 1 in n-dimensional Euclidean space and let 
S be a sphere centred at O which contains n linearly independent points of L on its boundary 
but no point of Lin its interior other than O. Then the family {{*Jn/2)S + A : A € L) covers 
the whole space. 

Woods in a series of three papers ( Mathematika 12 (1965), 138-42, 143-50 and J. 
Number Theory 4 (1972), 157-80) gave a simple proof of Conjecture II for n = 4 and 
proved it for n = 5, 6. For Conjecture I, Bambah and Woods ( J . Number Theory 12 (1980), 
27-48) gave a simple proof for n = 4. Around the same time, Skubenko {Zap. Naucn. 
Sem. Leningrad. Otdel. Mat. Inst. Steklov. {LOMI) 33 (1973), 6-36 and Trudy Mat. Inst. 
Steklov 142 (1976), 240-53) outlined a proof for n < 5. A complete proof for n = 5, on 
the lines suggested by Skubenko, was given by Bambah and Woods {J. Number Theory 12 
(1980), 27-48). McMullen {J. Amer. Math. Soc. 1 8 (2005), 7 1 1-34) later proved Conjecture 
I for all n. This, together with the results on Conjecture II mentioned above, implies that 
Minkowski’s conjecture is proved for all n < 6. Another proof for n = 3 was given by 
Birch and Swinnerton-Dyer {Mathematica 3 (1956), 25-39) and still another approach via 
factorization of matrices was explored by Macbeath {Proc. Glasgow Math. Assoc. 5 (1961), 
86 89) and later by Narzullaev in a series of papers. Gruber (1976) and Ahmedov (1977) 
showed however that this approach will not be successful for large n. 

Tchebotaref’s theorem appeared in Bulletin Univ. Kasan (2) 94 (1934), Heft 7, 3-16; the 
proof is reproduced in Zentralblatt fur Math. 1 8 ( 1 938), 1 1 0-1 1 . Mordell [ Vierteljahrsschrift 
d. Naturforschenden Ges. in Zurich, 85 (1940), 47-50] has shown that the result may be 
sharpened a little. See also Davenport, Journal London Math . Soc. 21 (1946), 28-34. 
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For more details, including asymptotic results and references, the reader is referred to 
Gruber and Lekkerkerker, Geometry of Numbers ; and Bambah, Dumir, and Hans-Gill, 
( Number Theory , 15-41, Birkhauser, Basel 2000). 

Minkowski’s conjecture for n = 2 (i.e. Theorem 455) can be interpreted as a problem 
on non-homogeneous binary indefinite quadratic forms. Its generalization to indefinite 
quadratic forms in n variables has aroused the interest of various writers including Bambah, 
Birch, Blaney, Davenport, Dumir, Foster, Hans-Gill, Madhu Raka, Watson, and Woods. 
In particular, Watson ( Proc . London Math. Soc. (3) 12 (1962), 564^76) found the optimal 
result for n ^ 21 and made a corresponding conjecture for 4 ^ n < 21. This conjecture 
was later proved by Dumir, Hans-Gill, and Woods (J. Number Theory 4 (1994), 190-197). 
Positive values of quadratic forms and asymmetric inequalities have also been studied and 
analogous results obtained. For references and related results see Bambah, Dumir, and 
Hans-Gill loc. cit. 

§ 24.10. Minkowski [Gesammelte Abhandlungen (Leipzig, 1911), i. 265, 270, 277] first 
conjectured the ^-dimensional generalizations of Theorems 458 and 460 and proved the 
latter for the /i-dimensional sphere [loc. cit . ii. 95]. The first proof of the general theorems 
was given by Hlawka [Math. Zeitschrift, 49 (1944), 285-312]. Our proof is due to Rogers 
[Annals of Math. 48 (1947), 994—1002 and Nature 159 (1947), 104-5]. See also Rogers, 
Packing and Covering for an account of the Minkowski-Hlawka theorems and subsequent 
improvements. 
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25.1. The congruent number problem. A congruent number is a ratio- 
nal number q that is the area of a right triangle, all of whose sides have 
rational length. We observe that if the triangle has sides a, b, and c, and if s 
is a rational number, then s 2 q is also a congruent number whose associated 
triangle has sides sa, sb, and sc. So it is enough to ask which squarefree 
integers n are congruent numbers. 

If we take c to be the length of the hypotenuse, then we are looking for 
squarefree integers n such that there are rational numbers a, b, c satisfying 

(25.1.1) a 2 + b 2 = c 2 and -ab = n. 

2 

A simple algebraic calculation shows that the positive solutions to the 
simultaneous equations (25.1.1) are in one-to-one correspondence with 
the positive solutions to the equation 

(25.1.2) y 2 =x 3 -n 2 x 
via the transformations 


x = 


n (a + c) 


y = 


zn (a -+• 


b 2 


y , 2 nx x 2 + n 2 

a = -, b = , c= 

x y y 


Thus n is a congruent number if and only if (25.1.2) has a solution in 
positive rational numbers x and y. 

Equation (25.1.2) is an example of a Diophantine equation, similar to 
those discussed in Chapter XIII. Equations of this shape are called elliptic 
curves , although we must note that the name is somewhat unfortunate, 
since elliptic curves and ellipses have very little to do with one another. 
More generally, an elliptic curve is given by an equation of the form 

(25.1.3) E:y 2 = x 3 +Ax + B, 


with the one further requirement that the discriminant 
(25.1.4) A = 4A 3 + 27B 2 


should not vanish. The discriminant condition ensures that the cubic poly- 
nomial has distinct (complex) roots and that the locus of E in the real plane 
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is nonsingular. For convenience, we shall generally assume that the coef- 
ficients A and B are integers. It is also convenient to write £(R) for the 
solutions to (25.1.3) in real numbers, £(Q) for the solutions in rational 
numbers, and so on. 

Elliptic curves form a family of Diophantine equations. They have many 
fascinating properties, some of which we shall touch upon in this chapter. 
Elliptic curves have provided the testing ground for numerous theorems 
and conjectures in number theory, and there are many number theoretic 
problems, such as the congruent number problem, whose solution leads 
naturally to one or more elliptic curves. Most notable among the recent 
applications of elliptic curves is Wiles’ proof of Fermat’s Last Theorem. 
Wiles makes extensive use of elliptic curves, despite the fact that when 
n ^ 4, the Fermat equation x n +y n = z n is itself most defintely not an 
elliptic curve. 

25.2. The addition law on an elliptic curve. In studying the solutions 
of equation (25. 1 .3), each nonzero number u gives an equivalent equation 

(25.2. 1) Y 2 = X 3 + u 4 AX + u 6 B 

via the identification ( x,y ) = (u~ 2 X,u~ 3 Y). We say that (25.1.3) and 
(25.2.1) define isomorphic elliptic curves. If A, B, and u are all in a given 
field k, we say that the curves are isomorphic over k, in which case there 
is a natural bijection between the solutions of (25.1.3) and (25.2.1) with 
coordinates in k. 

The j-invariant of E is the quantity 

4 A 3 _ 4A 3 

j ” 4A 3 + 21 B 2 ~ ~A~' 

If E and E' are isomorphic, then j(E) = j(E f ), and over an algebraically 
closed field such as C, the converse is true. Over other fields, such as <Q>, 
the situation is slightly more complicated, since the value of u is restricted. 
There are three cases, depending on whether one of A or B vanishes. 

Theorem 46 1 . Let E and E' be elliptic curves given by equations 

E:y 2 =x 3 +Ax + B and E'\y 2 = x 3 + A'x + B' 

having coefficients in some field k. Then E and E' are isomorphic over k if 
and only if j{E) = j{E') and one of the following conditions holds: 

(a) A = A! = 0 and BjB' is a 6th power in k; 
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(b) B = B' = 0 and 

(c) ABA'B' / 0 and 


A /A' is a 4th power in k ; 
AB' /A'B is a square in k. 


Suppose first that AB 0, so j(E) 0 and j{E) 1. If is and E' are 
isomorphic over k, then the relations A' = u 4 A and B' = u'B immediately 
imply that j(E') = j(E), so A'B' ^ 0, and also 


AB ' __ Au 6 B 
~A!B ~ uUB 


is a square in k. 

Conversely, suppose that j(E) = j(E') and AB' /A'B = w 2 for some 
u € K. The y'-invariant assumption implies that 


A 3 _ 27/(£) 
B 2 4-4 /(E) 


Hence 



27 /(g) 

4 - 4 ■/(£') 


A ' 3 

B' 2 


and b=u6b - 


so E and E' are isomorphic over k. The cases A = 0 and B = 0 are handled 
similarly. 

One of the properties that makes an elliptic curve E such a fascinating 
object is the existence of a composition law that allows us to ‘add’ points 
to one another. In order to do this, we visualize the real solutions (x,y) of 
(25.1.3) as points in the Cartesian plane. The geometric description of the 
addition law on E is then quite simple. Let P and Q be distinct points on 
E and let L be the line through P and Q. Then the fact that E is given by 
an equation (25.1 .3) of degree 3 means that L intersects E in three points. ^ 
Two of these points are P and Q. If we let R denote the third point in L n E, 
then the sum of P and Q is defined by 


P + Q = (the reflection of R across the x-axis). 


In order to add P to itself, we let Q approach P, so L becomes the tangent 
line to E at P. The addition law on E is illustrated in Figure 11. 

t The intersection points must be counted with appropriate multiplicity, and there are some special 
cases that we shall deal with presently. 
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Addition of distinct points 



Fig. 1 1 . The addition law on an elliptic curve 


The one situation in which addition fails is when the line L is vertical. 
For later convenience, we define the negation of a point P = (x,y) to be 
its reflection across the x-axis, 

-p = (x,-y). 

The line L through P and —P intersects E in only these two points, so there 
is no third point R to use in the addition law. To remedy this situation, we 
adjoin an idealized point O to the plane. This point O, which we call the 
point at infinity, has the property that it lies on every vertical line and on no 
other lines.* Further, the tangent line to E at O is defined to have a triple 
order contact with E at O. Then the geometric addition law on E is defined 
for all pairs of points. In particular, the special rules relating to the point 
O are 

(25.2.2) P + (-P) = O and P + O = P for all points PonE. 

We now use a small amount of analytic geometry and calculus to derive 
formulae for the addition law. Let P = (*/>, yp) and Q = (xq, yg) be two 
points on the curve E. If P = —Q, then P + Q = O, so we assume that 
P 7 ^ —Q. We denote by 


L:y = \x + v 

+ Those who are familiar with the projective plane P 2 will recognize that O is one of the points on 
the line at infinity. The projective plane may be constructed by adjoining to the affine plane A 2 one 
additional point for each direction, i.c. for each line through (0, 0). 
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the line through P and Q if they are distinct, or the tangent line to E at P 
if they coincide. Explicitly, 


(25.2.3) 


(25.2.4) 


x = ^ „ = ypXQ-yQXp , 


k = 


XQ — Xp 

3 xj, + A 


and v = 


x Q -x P 
—Xp + Axp — 2 B , 


, ifP = Q. 


2 yp 2 yp 

We compute the intersection of E and L by solving the equation 
(25.2.5) (kx + v) 2 = x 3 + Ax + B. 


The intersection of E and L includes the points P and Q, so two of the roots 
of the cubic equation (25.2.5) are xp and xq. (If P = Q, then xp will appear 
as a double root, since L is tangent to E at P). Letting R = (x p, yp) denote 
the third intersection point of E and L, equation (25.2.5) factors as 

(25.2.6) x 3 - k 2 x 2 + (A- 2 kv) x + (B - /3 2 ) 

= (x- x P ) ( x - xq) ( x - xp) . 

Comparing the quadratic terms of (25.2.6) gives the formula 

(25.2.7) x R = k 2 -x P - xq, 

and then the formula for L gives the corresponding yp = kxp + v. Finally, 
the sum of P and Q is computed by reflecting across the y-axis, 

(25.2.8) P + Q = (xp,-yp). 


For later use, we compute explicitly the duplication formula 


(25.2.9) - 2x P = 

y 2 yp J 4xp + 4 Axp + 4 B 

Theorem 462. Let E be an elliptic curve. The addition law described 
above has the following properties: 


(a) [Identity] P + 0 = 0+P = P for all P e E. 

( b ) [Inverse] P + (-P) = O for all P € E. 

(c) * [Associativity] (P + Q) + R = P + (Q + R) for all P, Q, R € E. 

(d) [Commutativity] P + Q = Q + P for all P,Q e E. 
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The identity and inverse formulae are true by construction, since we have 
placed O to lie on every vertical line and to have a tangent line with a triple 
order contact. Commutativity is also clear, since P + Q is computed using 
the line through P and Q, while Q+P is computed using the line through Q 
and P, which is the same line. The associative law is more difficult. It may 
be proven by a long and tedious algebraic calculation using the addition 
formulae and considering many special cases, or it may be proven using 
more advanced techniques from algebraic geometry or complex analysis. 

The content of Theorem 462 is that the set of points of E forms a com- 
mutative group with identity element O. Repeated addition and negation 
allows us to ‘multiply’ points of E by an arbitrary integer m. This function 
from E to itself is called the multiplication-by-m map, 


|m| terms 

(25.2. 10) <t> m : E^E, <p m (P) = mP = sign(m) (P + P + • • • + P ) . 

(By convention, we also define 4>o(P) = O). 

Theorem 462 says that the set of points of E forms a commutative group. 
The next result says that the same is true if we take points whose coordinates 
lie in any field. 

Theorem 463. Let E be an elliptic curve given by an equation (25.1.3) 
whose coefficients A and B are in a field k and let 

Elk) = {(*,y) € k 2 : y 2 — x 3 +Ax + B] U {O} . 

Then the sum and difference of two points in Elk) is again in Elk), so Elk) 
is a commutative group. 

The proof is immediate, since a brief examination of the formulae for 
addition on E show that if A and B are in k and if the coordinates of P and Q 
are in k, then the coordinates of P ±Q are also in k. The crucial feature of 
the addition formulae is that they are all given by rational functions; at no 
stage are we required to take roots. Thus Elk) is closed under addition and 
subtraction, and Theorem 462 says that the addition law has the requisite 
properties to make Elk) into a commutative group. 

If k is a field of arithmetic interest, for example Q or kli) or a finite field 
W p , then a description of the solutions to the Diophantine equation 

y 2 =x 3 + Ax + B with x,y € k 



ELLIPTIC CURVES 


555 


25.2 (464)] 


may be accomplished by describing the group E{k). To illustrate, we 
describe (without proof) the group of points with rational coordinates on 
the four curves 


E\:y 2 =x 3 + 1, E 2 : y 2 = x 3 — 43x + 1 66, 
Ey.y 2 = x 3 — 2, Ey.y 2 = x 3 + 17. 


The curve E\ has no nontrivial rational points, so E\ (Q) = [O} .The curve 
E 2 has finitely many rational points. More precisely, £2(<Q>) is a cyclic 
group with 7 elements, 


£ 2 ( Q) = {(3, ±8), (-5, ±16), (11, ±32), O). 


The curves £3 and £4, by way of contrast, have infinitely many rational 
points. The group £3 (Q) is freely generated by the single point P = (3, 5), 
in the sense that every point in £3(Q) has the form nP for a unique neZ. 
Similarly, the points P = (—2, 3) and Q = (2, 5) freely generate £4(Q) 
in the sense that every point in £4(Q) has the form mP + nQ for a unique 
pair of integers m,n e Z. We note that none of these assertions concerning 
£1, £2, £3, £4 is obvious. 

It is quite easy to characterize the points of order 2 on an elliptic curve. 

Theorem 464. A point P = (x,y) f O on an elliptic curve E is a point 
of order 2, i.e. satisfies 2 P = O, if and only if y = 0. 

According to the geometric description of the addition law, a point P has 
order 2 if and only if the tangent line to £ at £ is vertical. The slope of the 
tangent line L at P = (x,y) satisfies 

2 y^- = 3x 2 +A, 

dx 

hence L is vertical if and only ify = 0. (Note that it is not possible to have 
bothy = 0 and 3x 2 +.4 = 0, since y = 0 implies that x 3 + Ax + B = 0, 
and the condition A f = 0 ensures that x 3 + Ax + B = 0 and its derivative 
do not have a common root.) 

The multiplication-by-m map (25.2. 10) is defined by rational functions in 
the sense that x m p and y m p can be expressed as elements of Q(A,B,xp,yp). 
For example, the duplication formula (25.2.9) gives such an expression for 
X 2 p. Maps E —> E defined by rational functions and sending O to O are 
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called endomorphisms of E. Endomorphisms can be added and multiplied 
(composed) according to the rules 

(0 + 1r)(.P) = + i'i.P) and = <PW( p ))> 

and one can show that with these operations, the set of endomorphisms 
End(£) becomes a ring. 1- 

For most elliptic curves (over fields of characteristic 0), the only 
endomorphisms are the multiplication-by-m maps, so for these curves 
End(£) = Z. Curves that admit additional endomorphisms are said to 
have complex multiplication (or CM, for short). Examples of such curves 
include 

E 5 :y 2 =x 3 4- Ax, which has the endomorphism <f>i(x,y) — (— x, iy ), 
and 

E(,:y 2 = x 3 + B, which has the endomorphism <f> p (x,y) = (px,y). 

(Here i = V— T and p = e* ni are as in Chapter XII.) These endomor- 
phisms satisfy 

(f>? (P) = -P and <t> 2 p (P) + <t> p (P) +P = 0. 

One can show that End(Es) is isomorphic to the ring of Gaussian integers 
and that End(^) is the ring of integers in k(p). This is typical in the sense 
that the endomorphism ring of a CM elliptic curve over a field of character- 
istic 0 is always a subring of a quadratic imaginary field. In particular, the 
composition of endomorphisms is commutative, i.e. cp(\J/(P)) = i fr(<l>(P)) 
for all P G E} 

25.3. Other equations that define elliptic curves. A homogeneous 
polynomial equation 

(25.3.1) F(X,Y,Z)= J2 A ijk X‘VZ k = 0 

i-\-j-\-k==d 


+ The hardest part of the proof is the distributive law, i .e. to show that the mere fact that <p is defined 
by rational functions implies that <t> satisfies 4>(P + Q) = 4>(P) + <t>(Q). 

t However, it should be noted that there are elliptic curves defined over finite fields whose 
endomorphism rings are noncommutative. 
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F C X , Y, Z) = ^F(X, V, Z) = ^F(X, Y, Z) = ^-F (X, Y,Z) = 0 

oA a I oZ 

have no (complex) solutions other thanX = Y = Z = 0. One can show that 
any nonsingular equation (25.3.1) of degree 3 with a specified nontrivial 
solution Pq = (jco • yo ■ z o) is an elliptic curve in the sense that it may be 
transformed by rational functions into an equation of the form 

(25.3.2) y 2 + a\xy + a$y = x 3 + a 2 X 2 + a^x + ae, 

with the point Pq being sent to the point O sitting at infinity. Further, if k 
is a field containing all of the Ayt and containing the coordinates xo,yo, zq 
of Pq, then k also contains the new coefficients a\ , . . . , ae. An equation of 
the form (25.3.2) is called a generalized Weierstrass equation. 

The following example illustrates this general principle and is useful for 
applications. 

Theorem 465. The nonzero solutions to the equation 

(25.3.3) X 3 + Y 3 = A 


are mapped bijectively, via the Junction 

~£L). 

to the solutions (with x 0) of the equation 

(25.3.5) y 2 =x 3 - 432 A 2 . 

The inverse map is given by 

. . /36 A+y 36 )A—y\ 

(25.3.6) 

It is an elementary calculation to verify that the maps (25.3.4) and 

(25.3.6) take the curves (25.3.3) and (25.3.5) to one another and that 
the composition of the maps is the identity. The curve (25.3.3) has three 
points at infinity, corresponding to setting Z = 0 in the homogeneous form 
X 3 + Y 3 = AZ 3 . The transformation (25.3 .4) identifies the point ( 1 : —1:0) 
on (25.3.3) with the unique point at infinity on (25.3.5). 
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The discriminant of a generalized Weierstrass equation (25.3.2) is given 
by the rather complicated expression^ 

(25.3.7) A = — a\ae> + a\aj,a 4 + a\a 2 a\ — \2a\aiae + a\a\ 

+ Salavat + a\a\ + 36a\aia 6 - 8a\a\a\ 

— A8a\a\ae + 8a j <22 <*4 — 2>0a\a\a^ + 72a, <2406 

+ 16aia 2 a3a4 + 36aia2a3 + 144aja2a3a6 — 96aia3a4 

— 160^03 — 64a|a6 + 160^04 + 12a2a \ a * + 288a2a4a6 

- 21 a\ - 216a^a 6 - 432a^ - 64a|. 


One can check at some length that the curve is nonsingular if and only 
if A ^0. 

The most general transformation preserving the Weierstrass equation 
form (25.3.2) is 


(25.3.8) x = u 2 x' + r and y = « 3 / + w 2 s x' + t with u ^ 0. 


The effect of the transformation (25.3.8) on the discriminant is A' = 
w -,2 A. 

When investigating integral or rational points on an elliptic curve 
(25.3.2), it is often advantageous to impose a minimality condition on 
the equation that is analogous to writing a fraction in lowest terms. An 
equation (25.3.2) is called a {global ) minimal Weierstrass equation if for 
all transformations (25.3.8) with r,s, t € Q and u € Q*, the discriminant 

| A | is minimized subject to the condition a \, . . . , a& € Z. 


If the characteristic of k is not equal to 2 or 3, then the substitution 


/ 1 2 1 ,1,13 1 1 

x = x - —a, - -a 2 , y=y - -a\x - —a\ - -a\a 2 - -a 3 , 


t The astute reader will have noted that this new discriminant (25 .3.7) is 1 6 times our old discriminant 
(25. 1 .4). The extra factor is of importance only when working with the prime p = 2, in which case the 
new version is the more appropriate. 
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A 

B 


—a\ + -a]a 2 - -a\ai + -a\ - a 4 . 


g^ a l - - ^ 0,02 + + ja, a2 a. 


1 


1 


4 «3 _ 27 ° 2 + 3 fl2fl4 “ a6 ' 


25.4. Points of finite order. A point P e E has finite order if some 
positive multiple mP of P is equal to O. The order of P is the smallest such 
value of m. For example, Theorem 464 says that P has order 2 if and only if 
yp = 0. Using the theory of elliptic functions, one can show that the points 
of order m in E(C) form a product of two cyclic groups of order m. In this 
section, we prove an elegant theorem of Nagell and Lutz that characterizes 
the points of finite order in E(Q). In particular, there are only finitely many 
such points, and the theorem gives an effective method for finding all of 
them. 


Theorem 466. Let E be an elliptic curve given by an equation (25.1.3) 
having integer coefficients and let P — (x,y) e E(Q ) be a point of finite 
order. Then the coordinates of P are integers, and either y = 0 or else 

It is often convenient to move the ‘point at infinity’ on the equation 
(25.1.3) to the point (0, 0) by introducing the change of coordinates 

(25.4.1) z = — , w= 

y y 

The new equation for the elliptic curve is 

(25.4.2) E:w = z 2 + Azw 2 + Bw 3 , 


and the point O is now the point (z, w ) = (0, 0). (The three points on the 
curve withy = 0, i.e. the points of order 2, have been moved ‘to infinity’.) 
We observe that the transformation (25.4. 1) sends lines to lines; for exam- 
ple, the liney = kx + v in the (x,y)-plane becomes the line 1 = kz -f vw in 
the (z, w)-plane. This means that we can add points on E in the (z, w)-plane 
using the same procedure that we used in the (x,y)-plane. We now derive 
explicit formulae for the (z, w) addition law. 
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Theorem 467. Let E be an elliptic curve given by (25.4.2) and let P = 
(zp, wp ) and Q = (zq, wq ) be points on E. Set 

Zq + ZpZQ +Zp+ A\Vp 

(25.4.3) a = r-, 

1 — AZQ (wq + Wp) - B [Wq + WpWQ + wj, j 
P = wp — azp 


Then the z-coordinate ofP+ Qis given by the formula 


(25.4.4) 


zp+Q = 


2 AaP + 3 Ba 2 P 
1 + ,4a 2 + 5a 3 


+ Zp + ZQ. 


( If zp = zq and wp ^ wq, then a is formally equal to oo, so (25.4.4) must 
be interpreted as a — > oo and p/a —*■ —zp, which yields zp + Q = —zp in 
this case .t) 

The proof of Theorem 467 is not difficult, but it requires a certain amount 
of algebraic manipulation of formulae. Suppose first that zp zq, , so the 
line w = az + p through P and Q has slope 


wn — Wp 
a = — . 

ZQ -z P 

The points P and Q both satisfy (25.4.2). Subtracting gives 

(25.4.5) wq — wp = (zq - Zp') + A (zqWq — zpWp) + B (wq — Wp) 

= ( z q - 4 ) + Az q ( w g - w p) 

+ A(zq — zp) wj, + B (wq — . 

Every term in (25.4.5) is divisible by either wq — wp or zq - zp, so a small 
amount of algebra yields 

(25.4.6) 

Wq — Wp Zq ZpZQ + Zp + AWp 

Z Q — zp 1 _ Azq (wq + wp) — B (wq + wpwq + Wp) 


t If also B - 0, then the formulae need a small further modification that we leave to the reader. 
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(25.4.7) 


dw = 3zp + Awp 

dz 1 — lAzpwp — 3 Bwp 


We observe that (25.4.6) becomes equal to (25.4.7) if we make the sub- 
stitution ( zq , wq ) = (zp, wp), so we may also use (25.4.6) in this 
case. 

The line L\w = az + f} intersects the curve E at the points P and Q and a 
third point R. Substituting w = az- \- (3 into (25.4.2) gives a cubic equation 
whose roots, with appropriate multiplicities, are zp,zq, andz^. Thus there 
is a constant C so that 


z 3 +Az(az+fi) 2 +B(az + ft) 3 — (az + ft) 

= C(z~ zp) (z - zq)(z - z R ). 

Comparing the coefficients of z 2 and z 3 yields 

2 Aafi + 3 Ba 2 fi 
zp zq r i _|_ ^2 _|_ g a 3 

The points P, Q, and R satisfy P + Q + R = 0, so P + Q = —R. Finally 
we note that the negative of a point on E in the (z, w) plane is given by 
— (z, w) = (— z, — w), so the z-coordinate of P + Q is —zr. 

It remains to deal with the case zp — zq and wp ^ wq. Then the line L 
through P and Q is the line z — zp, and, provided 5^0, the line L intersects 
E at 3 points in the zw-plane. The third point R = (zr, wj?) necessarily 
satisfies zr = zp, since it lies on L, and then zp+q = z-r = —zr = —zp. 
This completes the proof of Theorem 467. 

We shall prove that points of finite order have integral coordinates by 
demonstrating that there are no primes dividing their denominators. For 
this purpose we fix a prime p and let 

It is easily verified that R p is closed under addition, subtraction, and mul- 
tiplication, so R p is a subring of Q. Further, divisibility may be defined in 
Rp just as it was for Z. The unities in R p , i.e. the elements with multiplica- 
tive inverses, are precisely those rational numbers whose numerators and 
denominators are both relatively prime to p. We may reduce elements of 
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R p modulo p, and the theory of congruences described in §§ 5.2 and 5.3 
remains valid.* 

We define the p-adic valuation v p (a) of a nonzero integer a to be the expo- 
nent of the largest power of p that divides a , and we extend the definition 
to rational numbers by setting 

v p = Vp (a) - v p ( b ) . 

We also formally set Vp(0) = co to be larger than every real number. Notice 
that R p is characterized by 

R p = {a € Q: v p (a) ^ 0} . 

The following properties of v p are easily verified:* 

(25.4.8) vp (afi) = v p (a) + v p (fi) , 

(25.4.9) Vp (a + p) ^ min \v p (a) , v p (0)| . 

Further, in the case of unequal valuation we have equality in (25.4.9), 

(25.4. 10) Vp (a) ^ v p (p) => v p (a + p) = min { v p (a) , v p ( P ) } . 

Theorem 468 . Let E be an elliptic curve given by equations (25 . 1 .3) and 
(25.4.2) having integer coefficients and let P = (x,y) = (z, w) be a point 
on E having rational coordinates. Then 

v p (x) <0 <<=>• Vp (y) <0 <=>• v p (z) >0 •<=>• Vp ( w ) > 0. 

If any of these equivalent conditions is true, then 

Vp (x) = — 2vp (z) , Vp (y) = — 3vp (z) , and v p (w) = 3v p (z) . 

All of the assertions of Theorem *468 are immediate consequences 
of the basic valuation rules (25.4.8), (25.4.9), and (25.4.10) applied to 
equations (25.1.3) and (25.4.2) defining E. 

Theorem 469. Let E be an elliptic curve given by an equation (25.4.2) 
having integer coefficients. Let P and Q be points of E whose (z, w)- 
coordinates are in R p , and suppose that these points satisfy 

(25.4.11) zp=zg = 0 (mod/?*) forsomek ^ 1. 


* Rp is an example of a local ring, i.e. a ring with a single maximal ideal, 
i Properties (25.4.8) and (25.4.9) say that the function v p : Q* — ► Z is a discrete valuation. 
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(25.4.12) zp+q =zp+zq (mod p 5k ). 

In particular, (25.4.1 1) implies thatzp+Q = 0 (mod p k ). 

Theorem 468 and (25.4.11) tell us that wp = wq = 0 (mod p* k ). We 
begin by ruling out the exceptional case in Theorem 467. Suppose that 
zp = zq. Subtracting (25.4.2) evaluated at P from (25.4.2) evaluated at Q 
yields 

(wq — Wp) (l - Azp ( WQ + Wp) - B (wq + WpWQ + w q^ = 0. 

The second factor is congruent to 1 modulo p, hence wq = wp. 

Having ruled out the case zp = zq and wp f wq , we see that the 
quantities a and ft defined by (25.4.3) of Theorem 467 satisfy 

a = 0 (mod p 2k ) and 0 = 0 (mod p 2k ). 


Then (25.4.4) in Theorem 467 gives 


Z P+Q = 


2Aa(5 + 3 Ba 2 f5 
1 + Ao e 2 + Ba 3 


+ zp + zq = zp + zq (mod p$ k ). 


Theorem 469 provides the tools needed to prove the integrality statement 
in Theorem 466. Let P = ( xp,yp ) € E (<Q>) be a point of finite order. We 
are required to prove that xp and yp are integers. If yp = 0, so 2 P = O 
from Theorem 464, then equation (25. 1 .3) of E shows that xp is an integer 
and we are done. We assume henceforth that yp f 0. 

Suppose to the contrary that there is some prime p dividing the denom- 
inator of xp. Switching to (z,w) coordinates, Theorem 469 tells us that 
p\zp. Let k = v(zp) > 0, so p k \zp and p k+l \ zp. Repeated application of 

(25.4.12) from Theorem 469 yields 

(25.4.13) z n p = nzp (mod p 5k ) for all n > 1. 


We now make use of the assumption that P has finite order, so mP = O 
for some m ^ 1 . Setting n = m in (25.4. 1 3) and using the fact that zq —0 
gives 


0 = zo = z m p = mzp (mod p 5k ). 


(25.4.14) 
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If p \ m, then (25.4.14) contradicts our assumption that p k+l \ zp, which 
proves that p does not divide the denominator of xp andy/>. 

It remains to deal with the case that p divides m. We write m = pm', set 
P' = m'P, and let k! = v{zp<). (Note that k! ^ k ^ 1 from (25.4.13) with 
n = m'.) Since P' has order p, the same argument yields 

0 = zo — z p p ' = pzp> (mod p 5k ). 


Hence p 5k '~ l divides zp>, which is again a contradiction. This completes 
the proof that the (x,y) -coordinates of points of finite order are integers. 

Now that we know that points of finite order have integral coordinates, 
the second part of Theorem 466 is easy. First, Theorem 464 says that 
2P = O if and only ify = 0, so we may assume that P = (x,y) has order 
m ^ 3. Then P and 2 P are both points of finite order, so from our previous 
work we know that they both have integral coordinates. The duplication 
formula (25.2.9) says that 


(25.4.15) 


x 2 p = 


Xp — 2Axp — SBxp + A 2 
4 Xp + 4 Axp 4- 4 B 


and a standard Euclidean algorithm or resultant calculation yields the 
identity 

(25.4. 1 6) (3x 2 + 4 A) (x 4 - 2 Ax 2 - 8 Bx 4- A 2 ) 

- (3x 3 - 5 Ax - 27 B) (x 3 4- Ax + B) = 4 A 3 + 27 B 2 = A. 


Combining (25.4. 1 5) and (25.4. 1 6) with the basic relation y 2 = x 3 +Ax+B 
gives 

(25.4. 1 7) y 2 p (4 (3xj> + 4 A) x 2 p - (3x 2 P - 5 Ax P - 275)) = A. 

All of the quantities in (25.4. 17) are integers, which proves that A. 

25.5. The group of rational points. Points of finite order in E(Q) are 
effectively determined by Theorem 466. Points of infinite order are far more 
difficult to characterize. A fundamental theorem, due to Mordell for EXQ) 
and generalized by Weil, states that every point in £(Q) can be written 
as a linear combination of points taken from a finite set of generators, 
where note that addition is always via the composition law on the elliptic 
curve E. 
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Theorem 470. Let E be an elliptic curve given by an equation (25.1.3) 
having rational coefficients. Then the group of rational points E(Q) is 
finitely generated. 

A standard algebraic result says that every finitely generated abelian 
group is the direct sum of a finite group and a freely generated group. Thus 
Theorem 470 implies the following more precise statement. 

Theorem 471. Let E be an elliptic curve given by an equation (25.1.3) 
having rational coefficients. There exists a finite set of points Pi , . . . ,P r 
in E(Q) such that every point in P € £(Q) can be uniquely written in 
the form 

P = n\P\ + /J 2 jP 2 4 1- n r P r + T, 

with n\,...,n r e Z and T a point of finite order. The nonnegative integer 
r, which is uniquely determined by E( Q), is called the rank ofE(Q). 

We begin with an elementary lemma and some rank 0 cases of Theo- 
rem 470, after which we state a weak form of the theorem and use it to 
deduce the full theorem via a Fermat-style descent argument. 

Theorem 472. Let E be an elliptic curve given by an equation (25.1.3) 
having rational coefficients and let P = (x,y) be a point ofE with rational 
coordinates. Then the coordinates of E may be written in the form 

P = (jp’ Jf) with gcd ( a ' = (fi,d)= 1 . 

Theorem 472 is a consequence of Theorem 468, but we give a short direct 
proof. We write the coordinates of P — ( a / u, b/v ) as fractions in lowest 
terms with positive denominators and substitute into (25.1.3) to obtain 

(a number prime to v) (a number prime to u) 
v 2 u 2 

Hence v 2 = u 2 , and on comparing the prime factorizations of v and u, we 
see that there is an integer d such that v = d 2 and u = d 2 . 

Some of the Diophantine equations that we studied in Chapter XIII were 
elliptic curves. The next two theorems reformulate those results to prove a 
few rank 0 cases of Theorem 470. 

Theorem 473 . The elliptic curve E:y 2 = x 2 + x has rank zero. Its group 
of rational points E (Q) = {(0, 0) , 0\ is a cyclic group of order 2. 
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Let P = (a/d 2 , b/d 3 ) e E( Q). Then 

(25.5.1) b 2 = a 3 + ad 4 = a (a 2 + d 4 ), 

and the fact that gcd(a, d) = 1 implies that the factors in (25.5. 1 ) are squares, 
say 


a = u 2 and a 2 + d 4 = v 2 . 

Eliminating a yields u 4 + d 4 = v 2 , and then Theorem 226 tells us that 
udv = 0. By assumption, d f=- 0, and v = 0 forces u = d = 0, so the only 
solution is u = 0. Hence a — 0 and P = (0, 0). 

Theorem 474. For each value of B e {16,-144,-432,3888}, the 
elliptic curve 

Es'.y 2 = x 2 + B 


has rank 0, that is, Eb(Q) is finite. 

Theorem 465 gives a map from the curve 

Ca'.X 3 + Y 3 = A 

to the curve E_^2a 2 . This map, with at most a couple of exceptions, 
identifies the set of rational points Ca(Q) with the set of rational points 

^- 432 A 2 (Q) • 

An argument similar to that given in the proof of Theorem 472 shows that 
every rational point in Ca (Q) has the form (a/c, b/c), where the fractions 
are in lowest terms. Thus 


a 3 + b 3 = Ac 3 . 

Theorem 228 for A — 1 and Theorem 232 for A =3 tell us that 
Ci (Q) = {(1,0), (0, 1)} and C 3 (Q) = 0, 
from which it follows that £-432 (Q) and £3888 (Q) are finite. 
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It is an algebraic exercise to verify that the following formula gives a 
well-defined map from Eb to E-tib that is at most 3-to-l on £b(Q),* 

Es'.y 2 = x 3 + B — ► E- 2 iB'-y 2 = x 3 - 21B, 

(x,y) i — > ((x 3 -I- 4 B) /x 2 ,y (x 3 - 85) /x 3 ) . 

Taking B — 16 gives £ifi(Q) -* £-432 (Q), so £i6(Q) is finite, and 
similarly taking B = — 144 shows that £_i 44 (Q) is finite. 

We now take up the proof of Theorem 470, which is traditionally divided 
into two parts. The first part we state without proof, since it requires tools 
beyond our disposal.* 

Theorem 475. Let E be an elliptic curve given by an equation (25.1.3) 
having rational coefficients. Then the quotient group £(Q)/2£(Q) is finite, 
i.e. there is a finite set of points Q\,. . . ,Qk € £( Q) such that every Q in 
E(Q) can be written in the form 

Q = Qi + 2Q' 

for some 1 ^ ^ & and some Q' e E(Q). 

The second part of the proof of Theorem 470 is a descent argument very 
much in the spirit of Fermat. Making a change of varibles of the form 
x = u 2 x' and y = u 3 y' for an appropriate rational number u, we may 
assume that the equation (25.1.3) defining E has integer coefficients. 

For the descent, we shall use height functions to measure the arithmetic 
size of points in £(<Q>). The height of a rational number t e Q is the quantity 

H(t)=H =max {|a|, |6|} for t=^ e Q with gcd (a,b) = 1, 

and the height of a point P = (x/>, yp) € £(Q) is then defined by 

H(P) = H(x P ) if P # O, and H (O) = 1. 

It is clear that there are only finitely many rational numbers of height less 
than any given bound, and similarly for points in £(Q), since each rational 
x-coordinate gives at most two rational y-coordinates. 

t The map is exactly 3-to-l on complex points Eg( C) — ► £-27.8(0- Maps between elliptic curves 
defined by rational functions are called isogenies . 

$ If the cubic equation x 3 + Ax + B in (25, 1 .3) has a rational root, then Theorem 470 admits an 
elementary, albeit lengthy, proof, which may be found, for example, in Silverman-Tate, Rational 
Points on Elliptic Curves , Chapter III. 
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The key to performing the descent is to understand the effect of the group 
law on the heights of points. 

Theorem 476. Let E be an elliptic curve given by an equation (25.1.3) 
having integer coefficients. There are constants c\ and C2 > 0 so that 


(25.5.2) H (P + 0 ^ c\H (P) 2 H (0 2 for all P,Q<=E(Q), 

(25.5.3) H(2P) > c 2 H(P) 4 for all P € E (Q) . 

The height function satisfies H ^ 1, so both (25.5.2) and (25.5.3) are 
true with c \ = c 2 = 1 if either P = O or Q = O. Similarly, if P + Q = O, 
then (25.5.2) is true with c\ — 1 . We consider the remaining cases. 

We use Theorem 472 to write 

/> = („,,„) = |) and e = (we) = (|.|)- 

Assuming that P ^ Q, the addition formulae (25.2.3), (25.2.7), (25.2.8) 


give 


(25.5.4) 


xp+Q = 

(yQ -yp \ 2 

1 xp-xq 

\XQ-XpJ 


(xpxPq + A) (xp + xq) +2B — 2 ypyg 


(xp-xq ) 2 


(apaQ + Adpdgj (ffP^Q + a Q.dp} + 2 Bdpdg — IbpdpbQdQ 

(apdg - a Q d£f 

The height of a rational number can only decrease if there is cancellation 
between numerator and denominator, so (25.5.4) and the triangle inequality 
yield 


H (x p +q) ^ c 3 max {|u/>| 2 , \d P \ 4 , \b P d P \} 

x max {\a Q \ 2 ,\d Q \ 4 , \b Q d Q \} . 


(25.5.5) 
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(Explicitly, we may take C3 = 4 + 2\A\ + 2|2?|.) Next we observe that since 
P and Q are points on the curve, their coordinates satisfy 

bp = a 3 p + Aapdp + Bdp and bg — ag + Aagdg 4- Bdg. 

Hence 

(25.5.6) \bp\ ^ C4max [\ap\ 3 ^ 2 , |<//>| 3 } and 
\b Q \ ^ C4max {|a 0 | 3/2 , \d Q \ 3 ). 

(Explicitly C4 = 1 4- \A\ 4- |Z?|.) Substituting (25.5.6). into (25.5.5) yields 

H(xp+ Q )^cicl max{|a/>| 2 , \d P \ 4 } max {\a Q \ 2 , \d Q \ 4 ) 

=c { H(P) 2 H(Q) 2 , 

which completes the proof of (25.5.2) for P ^ Q. The proof for P = Q is 
similar using the duplication formula (25.2.9) and may safely be left to the 
reader. 

We turn now to the lower bound (25.5.3). If the polynomial x 3 4- Ax + B 
has any rational roots, then we first insist that the positive constant cj 
satisfies 

(25.5.7) c 2 < min {//(£) -4 : £ € Q and £ 3 +/f|+fl = 0 ) . 

Theorem 464 then tells us that (25.5.3) is true if 2 P = O, so we may 
assume that IP ^ O. 

To ease notation, we write 


a 

xp = ~$ 

as a fraction in lowest terms. We define polynomials 

F(X,Z) = X 4 - 2 AX 2 Z 2 - SBXZ 3 + A 2 Z 4 , 

G(X, Z ) = 4 X 3 Z + 4AXZ 3 + 4 BZ 4 , 

and we use them to homogenize the duplication formula (25.2.9). Thus the 
^-coordinate of 2 P is given by 


X2P = 


F(a,S) 

G(a,S) m 


(25.5.8) 
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The Euclidean algorithm or the theory of resultants tells us how to find 
relationships that eliminate either X or Z from F and G, cf. (25.4.16). 
Explicitly, if we define polynomials 

(25.5.9) f\ ( X , Z) = 1 IX 2 Z + 1 6 AZ\ 

(25.5. 1 0) g i (X, Z) = 3X 3 - 5 AXZ 2 - 21BZ 3 , 

(25.5. 1 1) f 2 (X, Z)=4 (4 A 3 + 21 B 2 ) X 3 - 4A 2 BX 2 Z 

+ 4 A (3 A 3 4- 22 B 2 X) Z 2 + 12 B (A 3 4- 8 B 2 ) Z 3 , 

(25.5. 12) g 2 (X, Z) = A 2 BX 3 + A (5 A 3 + 32 B 2 ) X 2 Z 

+ 2B (13A 3 + 96 B 2 ) XZ 2 - 3 A 2 (A 3 + SB 2 ) Z 3 , 

then an elementary, but tedious, calculation verifies the two formal 
identities 

(25.5.13) f!(X,Z)F(X,Z) +gi(X,Z)G(X,Z) = 4AZ 7 , 

(25.5.14) f 2 (X , Z)F (X , Z) + g 2 (X,Z)G(X,Z) = 4AX 7 . 

Here A = 4 A 3 + 21 B 2 ^ 0 is the discriminant of E, as usual. 

We substitute X = a and Z = S into (25.5.13) and (25.5.14) to obtain 

(25.5.15) fd<x,8)F(a,8) + gi(a,8)G(a,8) = 4 AS 7 . 

(25.5.16) f 2 (a,8)F(a,8)+g 2 (a,8)G(a,8) = 4Aa 7 . 

From (25.5.15) and (25.5.16) and the fact that gcd(or, 8) = 1, we see that 

gcd(F(Qf, 6), G(a, 8)) | 4A. 

Hence there is at most a factor of 4A cancellation between the numerator 
and the denominator of (25.5.8), so 


H (x 2 p) > 


max {F(a, 5), G(a, 6)} 
|4A| 


(25.5.17) 
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The identities (25.5.15) and (25.5.16) also allow us to estimate 


(25.5.18) |4A<5 7 | ^ 2max{|/i(a,S)| ,|gi(a,<$)|} 

x max {|F(a,5)| , |(?(a,5)|} , 

(25.5.19) |4A5 7 | ^ 2max{|/ 2 (a,S)|,|g 2 (a,S)|} 

x max {|F(a,5)| , |G(a,6)|} . 


Looking at the explicit expressions (25.5.9)-(25.5.12) iovf\,g\,f 2 , and g 2 , 
we see that 


(25.5.20) max {|/i(a,<S)| , |gi(a,<S)| , |/ 2 (a,S)| , |g 2 (a,<$)|) 

^ c 5 max { |of | 3 , |<S| 3 } , 

where c$ depends only on A and B. Combining (25.5.18), (25.5.19), and 

(25.5.20) yields 

(25.5.21) 4|A|max{|a|,|S|} 7 

^ 2^5 max { |of | , |5|} 3 - max {|F (a, 6) |,|G(a,S) |}, 

and then (25.5.17) and (25.5.21) imply that 

H{x 2 p ) ^ (2 c 5 ) _1 max {|or| , |<5|} 4 ^ c 2 H (x P )\ 

where we may take any positive c 2 ^ (2 C5) -1 satisfying (25.5.7). This 
completes the proof of (25.5.3). 

Theorem 476 is written in multiplicative form, in the sense that it relates 
sums of points on E to products of their heights. It is convenient to rewrite 
it using the logarithmic height 

h(P) = \ogH(P). 

With this notation, the two inequalities of Theorem 476 become 

(25.5.22) h{P + Q) < 2h(P) + 2 h(Q) + Ci for all P,QeE(Q ) , 

(25.5.23) h(2P) ^ 4 h(P) - C 2 for all P e E (Q) , 

where Ci and C 2 are nonnegative constants depending only on E. 

We shall now prove that there is a set of points S C E (Q) of bounded 
height such that every point in £(Q) is a linear combination of the points 
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in <S. This implies finite generation of J?(Q) (Theorem 470), since sets of 
bounded height are finite. 

Theorem 475 tells us that there is a finite set of points Q \ , . . . , Qk € E (Q) 
such that every point in E(Q) differs from some Qj by a point in 2£’(Q). 
We set 

1 C 4- C 

(25.5.24) C 3 = - max [h (Qj) :l^j^k} + - 1 -— , 

where C\ and C 2 are the constants appearing in (25.5.22) and (25.5.23), 
respectively, and we define our finite set of points S C E(Q) by 

(25.5.25) S = {R € E(Q):h(R) ^ 2C 3 + 1 } . 

Note in particular that Q \ , . . . , Qk are in S. 

Let Pq e E(Q) be an arbitrary nonzero point in E(Q). We inductively 
define a sequence of indices jo,j\ J 2 , . . . and points Pq, P \ , P 2 , . . . in E(Q) 
satisfying 

(25.5.26) Po = 2P\ + Qj x , Pi = 2P 2 + Q j2 , P 2 = 2P 3 + Qj „ . . . . 

The choice of the successive P, and y, need not be unique, but Theorem 475 
ensures that at each stage there is at least one choice. We apply first (25.5.23) 
and then (25.5.22) to show that the heights of the P, are rapidly decreasing. 
Thus 

(25.5.27) h(Pi) ^ i (h(2Pi) + C 2 ) = i (h(Pi - 1 - Q Jt ) + C 2 ) 

< i (2 A(P,-_i ) + 2h(Qj,) + C, + C 2 ) 

^ -A(P,_i) + C 3 , 

4 

where C 3 is defined by (25.5.24), and we have used the fact that h{—Q) = 
h(Q), since h(Q) depends only on xq. 

We apply (25.5.27), starting at P„ and working backwards to Po, 

h(P n ) ^ —h(Po) + ^1 + ^ ^ 4 f ^ 2 2C 3 - 

Hence if we choose n to satisfy 2" ^ /j(Pq), then the point P„ is in the set 
S defined by (25.5.25). Finally, using back-substitution on the sequence of 
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Po = 2”P„ + Y^2‘-'Qj i , 
1=1 


so the original point Pq is a linear combination of points in S. This com- 
pletes the proof that the finite set S is a generating set for the group 
E( Q). 

25.6. The group of points modulo p. It is instructive to investigate 
elliptic curves whose coefficients lie in other fields, for example the field 
of p elements, which we denote by The mod p points on the curve, 

£(Fp) = ( (x ,y) e ¥ 2 p : y 2 = x 3 + Ax + B (mod p) } U {O ) , 

can be added to one another via the usual addition formulae (25.2.2)- 
(25.2.8), and they satisfy the usual properties as described in Theorem 462. 
We can use the Legendre symbol (§ 6.5) to count the number of points in 

E(F p ) by applying the fact that the congruence y 2 = a (mod p) has 1 + 
solutions. Thus 


#£( F p )=l + 



+ Ax + B 

P 



= p+ 1 + 



Ax B 

P 


We would expect the quantity to be + 1 and —1 approximately 

equally often, so #£( F p ) should be approximately p + 1. The validity of 
this heuristic argument is put into a precise form in a theorem due to Hasse. 

Theorem 477*. Let p be a prime number and let E be an elliptic curve 
with coefficients in the finite field F p of p elements. Then the number of 
points of E with coordinates in F p satisfies the estimate 

|#zr(F p )-o>+i)| <2v£. 

t For simplicity, we assume that p is an odd prime. In order to work with elliptic curves over F 2 or 
over other fields of characteristic 2, it is necessary to use a generalized Weierstrass equation (25.3.2) 
with a correspondingly more complicated expression (25.3.7) for the discriminant as discussed in 
§25.3. 
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25.7. Integer points on elliptic curves. Elliptic curves frequently have 
infinitely many points with rational coordinates, since the sum of two 
rational points is again a rational point. The situation for points with integer 
coordinates is much different, since a perusal of the rational functions used 
in the addition formulae (25.2.2)-(25.2.8) makes it clear that the sum of 
integer points need not be an integer point. 

The principal theorem in this area, due to Siegel, says that an elliptic 
curve has only finitely many integer points. We start by proving three 
elementary cases of Siegel’s theorem, continue with an example showing 
the close connection between integer points on (elliptic) curves and the 
theory of Diophantine approximation (Chapter XI), and conclude with the 
full statement of Siegel’s result. 

Theorem 478*. The equation 
(25.7.1) y 2 =x 3 +7 

has no solutions in integers .t 

Suppose that (x,y) is an integer solution to (25.7.1). Note that x cannot 
be even, since a number of the form Sk + 7 cannot be a square. We rewrite 

(25.7.1) as 

(25.7.2) y 2 + 1 = x 3 + 8 = (x + 2) (x 2 — 2x + 4) . 

Since x is odd, we have 

x 2 - 2x + 4 = (x - l) 2 + 3 = 3 (mod 4), 

so there exists some prime p = 3 (mod 4) dividing x 2 — 2x + 4. Then 

(25.7.2) implies that 

y 2 = ~ 1 (mod/?), 

which is a contradiction of Theorem 82. Hence (25.7.1) has no integer 
solutions. 

Theorem 479*. The only solutions in integers to the equation 

(25.7.3) y 2 =x 3 -2 
are (x,y) = (3, ±5). 

* I" fad. equation (25.7.1) has no solutions in rational numbers, but the proof requires different 
methods and is significantly more difficult. 
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We work in the ring of integers in the quadratic field k (V— 2) , which 
according to Theorem 238 is the set of numbers of the form 


a + 67=2 with a, 6 € Z. 


The field k( 7=2) is a Euclidean field (Theorem 246), so its elements have 
unique factorization into primes, and its only unities are ±1 (Theorem 
240). 

We now suppose that (x,y) is a solution in rational integers to (25.7.3). 
Our first observation is that x andy must be odd, since if 2 | x, then 

y 2 = —2 (mod 8), 

which is not possible. 

In the ring of integers of k(y/—2) we have the factorization 

(25.7.4) x 3 =y 2 + 2 = (y+ 7=2) (y - 7=2). 

Any common factor ofy 4- \f—2 andy — 7=2 must divide their sum 2 y and 
their difference 27—2. But neither factor in (25.7.4) is divisible by 7 _ 2, 
since y is odd, so they have no common prime factors. Hence (25.7.4) 
implies that each factor is a cube in the ring of integers of fc(7= 2), say 

(25.7.5) y + 7=2 = £ 3 and y - 7=2 = r? 3 . 

Subtracting the second equation in (25.7.5) from the first yields 

(25.7.6) 2^=2 = £ 3 - rj 3 = (§ - n) (S 2 + + 7 2 ) • 

The equations (25.7.5) are complex conjugates of one another, so if we 
write $ = a + bs/—2, then rj = a — b^/—2, and (25.7.6) becomes 

27=2 = 267=2 (3a 2 - 2b 2 ) . 

Hence b = 1 and a = ±1, which yields y = ±5 and x = 3. 

Theorem 480*. Let A be a nonzero integer. Then every solution in 
integers to the equation 


x 3 +y 3 = A satisfies x 2 + y 2 < 2\A\. 
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The elementary proof of Theorem 480 hinges on the fact that the cubic 
form x 3 + y 3 factors as 

x 3 + y 3 = (x+y)(x 2 -xy+y 2 ) = A. 

Since* +y ^ 0, we have |x +y\ ^ 1, so 

Ml ^ \x 2 -xy+y 2 \ ^ ^ (x 2 + y 2 ) . 

It is natural to attempt to repeat the proof of Theorem 480 for equations 
such as 

x 3 + 2y 3 = A 


by using the factorization 

(x + y/2y)(x 2 — y/2xy + v^4y 2 ) = A. 

It turns out that the integers in the field k{y/l) satisfy the fundamen- 
tal theorem, but the existence of infinitely many unities prevents the 
elementary proof from succeeding. In general, the existence of integral 
points on elliptic curves is closely tied up with the theory of Diophantine 
approximation. 

Theorem 481*. Let d be an integer that is not a perfect cube and let A 
be a nonzero integer. Then the equation 

(25.7.7) x 3 + dy 3 = A 

has only finitely many solutions in integers. 

In order to prove Theorem 481, we require a result on Diophantine 
approximation that is stronger than Theorem 191. Such estimates were 
proven by Thue, Siegel, Gelfond, and Dyson before culminating in the 
following theorem of Roth (see the Notes to Chapter XI). 

Theorem 482*. Let £ be an algebraic number of degree at least 2 as 
defined in § 1 1 .5. Then for every e > 0 there is a positive constant C, 
depending on £ and 6, so that 



for all rational numbers a/b written in lowest terms with b > 0. 
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The proof of Theorem 482, or even a weaker version in which the expo- 
nent on b is any value strictly smaller than the degree of £, would take us 
too far afield. So we shall be content to use Theorem 482 in order to prove 
Theorem 481. 

To ease notation, we let 8 = J/d,and we let p = j (—1 4- V— 3)be a 
cube root of unity as in Chapter XII. We also replace y by —y, so equation 
(25.7.7) factors completely as 

x 3 — dy 3 = (x — <5y)(x — p<$y)(* — P 2 &y) = A. 

We divide by y 3 to obtain 

'*’-•> (He-) e—B- 

The real number x/y cannot be close to either of the complex numbers p8 
or p 2 8. Indeed, 


-~p8 

y 


^ Im (p8) = 


sfS8 

~2 


and similarly for | x/y— p 2 8\. Hence (25.7.8) leads to the estimate 

2 


\A\_ 

l^l 3 


--Vd 

y 


( 


=?)■ 


Thus there is a constant C, which is independent of x and y, such that 


(25.7.9) 


C 


--Vd 

y 


We now apply Theorem 482 with € = ^ to the algebraic number v^, which 
gives a corresponding lower bound 


(25.7.10) 


--Vd 

y 




\y \ 5/2 


Combining (25.7.9) and (25.7.10) yields 

(C'/C ) 2 > \y\. 
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which shows thaty takes on only finitely many values. Finally, the equation 
x 3 +2 y 3 = A shows that each value ofy leads to only finitely many values 
for*. 

An argument similar to, but significantly more complicated than, the 
proof of Theorem 481 was used by Siegel to show that an analogous result 
is true for all elliptic curves. 

Theorem 483*. Let E be an elliptic curve given by an equation having 
rational coefficients. Then E has only finitely many points with integer 
coordinates. In particular, the equation 

- y 2 = x 3 + Ax + B with A,B € Z and 4 A 3 + 21B 2 ^ 0 

has only finitely many solutions in integers. 

Siegel’s proof of Theorem 483 yields a stronger result saying, in effect, 
that the numerators and the denominators of the coordinates of rational 
points have approximately the same size. 

Theorem 484*. Let E be an elliptic curve given by an equation having 
rational coefficients and let P\,P 2 ,P-i , • • . e E(Q) be a sequence of distinct 
rational points. Write the x-coordinate of Pi as a fraction xpi = af fti- Then 

.. logical 
hm - — — = 1. 

/-►oo log 1/3/1 

25.8. The /.-series of an elliptic curve. Let E be an elliptic curve given 
by a minimal Weierstrass equation^ (25.3.2). For every prime p, we reduce 
the coefficients of (25.3.2) modulo p and, provided that p \ A, we obtain 
an elliptic curve E p defined over the finite field ¥ p . Theorem 477 tells us 
that the quantity 


(25.8.1) a p =p + 1 — #E(F p ) satisfies \a p \ < 2 *Jp. 

(If />| A, we still define a p using (25.8.1). One can show in this case that 
a p e {-1,0,1}.) 

It is convenient to encapsulate all of this mod p information into a 
generating function. The L-series of E is the infinite product 


(25.8.2) 


= n x n 13 

p\A py p>[A 


i 


a P P~ s +P 


1— 2s ‘ 


t If we ignore the primes p = 2 and p = 3, then it suffices to take an equation (25. 1 .3) with A, B e Z 
and gcd 12th power free. 
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The product (25.8.2) defining the L-series can be formally expanded into 
a Dirichlet series 


(25.8.3) 




using the geometric series 

1 _ a p , • 1 • / a p . 1 

1 ~a p p~ s ~' Q P ks 3n 1 ~ a pP~ S +P l ~ 2s ~ \P* P 2s ~ X ) 

Theorem 485*. The coefficients a n of the L-series L(E,s ) have the 
following properties: 

(25.8.4) a mn = a m a„ for all relatively prime m and n. 

(25.8.5) apOpk = Opk+ 1 + papk-i for all prime powers p k with k ^ 1. 

(25.8.6) |a„| < d(n)*Jn for all n ^ 1. 

( Here d(n) is the number of divisors of n, see § 16.7.) 

The proofs of (25.8.4) and (25.8.5) are formal computations. First, 
comparing (25.8.2) and (25.8.3), we see that 

(25.8.7) «*.,)- nES- 

P 0 y 

Hence if we factor n as n = p k \p k f -p kl , then 

a n = a *i a *2 • • • a ** . 

Pi ft Pi 

In particular, a mn = a m a„ if gcd (m,n) = 1. 

Next, for each prime p \ A, we factor 


(25.8.8) 1 — apX +pX 2 = (l — otpX) (l — PpX) with a p ,P p e C. 
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For p| A, we set a p = a p and ft p = 0, and then in all cases, the p-factor in 
(25.8.2) is equal to 


(25.8.9) 




S a'pP’p- 

i+j=k 


(For p| A, we set 0° = 1 by convention.) 
Comparing (25.8.9) and (25.8.7) yields 


(25.8.10) 


a p k ^ ' a pPp 
i+j=k 


a * +1 — £* +1 

D • D 


OCp ftp 

Using (25.8.10) and the relation a p ft p = p from (25.8.8), we compute 

'«*+' - / 3* +1 \ 4 +2 ~ 4 +2 + (4 ~ ft ) 


a p apk = (oc p + ftp) 


\ a p ~ Pp / 


OCp — ftp 
= dpk + 1 + pdpk-l . 


We verify (25.8.6) by applying Theorem 477, which tells us that 
\a p \ < 2^/p. This implies that the roots of the quadratic polynomial (25.8.8) 
are complex conjugates, hence a p and ft p are complex conjugates whose 
product is equal to p. They thus satisfy 


(25.8.11) 


<*p\ = \Pp\ = Vp- 


Applying (25.8.1 1) to (25.8.10) gives 

v|< E |«^| = E p i/2 = (* + »/ /2 = ^V ' 2 

i+j—k i+j—k 


Then the multiplicativity (25.8.4) of the a n and the multiplicativity of d(n) 
from Theorem 273 imply that \a n \ ^ d(ri)*fh. 

Theorem 486*. The L-series L(E,s ) defined by (25.8.2) and (25.8.3), 
considered as a Junction of the complex variable s, is absolutely convergent 
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for all Re(j) > \and defines a nonvanishing holomorphic function in that 
region. 


The estimate (25.8.6) in Theorem 485 says that the Dirichlet coefficients 
of L(E,s) satisfy \a n \ ^ d{n)«Jh. Theorem 315 tells us that the sum of 
divisors function is quite small, 


d(n) = 0(n s ) for any 8 > 0. 


We write a = Re(s) and estimate the Dirichlet series (25.8.3) by 


E &n 

n° 


n>\ 


n>\ 


d{n)n x / 2 


n c 



Hence the Dirichlet series is absolutely convergent for Re(s) > | + 8, and 
since 8 is arbitrary, L(E,s) defines a holomorphic function on Re(s) > |. 
Finally, the nonvanishing of L(E,s ) on the region Re(s) > \ follows from 
its product expansion (25.8.2). 

Although the series (25.8.2) defining L(E, 5 ) only converges for Re(s) > 
|, the function that it defines is similar to the Riemann £ -function in the 
sense that it has an analytic continuation and satisfies a functional equation. 
The next theorem represents a pinnacle of modem number theory, but its 
proof is far beyond the scope of this book. 

Theorem 487*. The L-series L(E, s) has an analytic continuation to the 
entire complex plane. Further, there is an integer Ne, the conductor of E, 
that divides the discriminant A such that the Junction 

HE,s ) = N s e /2 (2n)~ 2 T (s)L(E,s) 
satisfies the functional equation 


%(E,2 — s) = ±£(£,j) for alls e C. 

The L-series of an elliptic curve is built up out of purely local (mod p) 
information. A conjecture of Birch and Swinnerton-Dyer predicts that 
L(E,s ) contains a significant amount of global information concerning the 
rational points on the curve. For example, they conjecture that the order of 
vanishing of L(E, s) at s = 1 equals the rank of the group of rational points 
£(Q). In particular, L(E, 1) should vanish if and only if E(Q) contains 
infinitely many points. The small amount of progress that has been made 
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on the conjecture of Birch and Swinnerton-Dyer, as described in the next 
theorem, requires a vast panoply of mathematical tools for its proof. 

Theorem 488*. IfL(E, 1) ^ 0, then £(Q) has rank 0; and ifL(E, 1) = 
0 and L'(E, 1)^0, then £(Q) has rank 1. 

25.9. Points of finite order and modular curves. We have seen in 
§ 25.4 that any particular elliptic curve has only finitely many points of 
finite order having rational coordinates. In this section, we change our 
perspective and attempt to classify all elliptic curves having a point of a 
given finite order. Thus, for a given integer TV ^ 1 , we aim to describe the 
set of ordered pairs 

£ is an elliptic curve and P is 1 
a point of exact order N on E J ’ 

up to the natural equivalence relation in which any two pairs (£j, P\) 
and (£ 2 , Pi) are considered to be identical if there is an isomorphism 
<f> : E 1 — ► £2 satisfying <p(P\) = Pi. This is an example of what is known 
as a moduli problem. 

For example, if N = 1, then we simply want to classify elliptic curves 
up to isomorphism. We already know how to do this using the ./-invariant, 
since two curves E\ and £2 are isomorphic if and only if their ^'-invariants 
j(E\) and 7 (£ 2 ) are equal, cf. Theorem 461. 

Theorem 489. Let E be an elliptic curve given by an equation (25.1.3) 
with coefficients in a field k, and let P 6 E(k) be a point with coordinates 
in k and satisfying 2P f=- O and 3P O. Then there is a change of 
coordinates (25.3.8) with u, r, s, te k that transforms E into an equation 
of the form 

(25.9.2) y 2 + (w + \)xy + vy = x 3 + vx 2 with P = (0, 0) . 

The discriminant of the elliptic curve (25.9.2) is 

(25.9.3) 

A = —v 3 (w 4 -|- 3w 3 + 8 vw 2 + 3w 2 — 20vw + w + 16v 2 — v) . 

The values of w and v are uniquely determined by E and P 

Proof We begin with the transformation 

x 1 — ► x + xp and y 1 — > y +yp. 
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which has the effect of moving P to the point (0, 0) and puts E into the 
form 


y 2 + A \y = x 3 + B\x 2 + C\x. 


The assumption that 2 P ^ O tells us that A i # 0 (cf. Theorem 464), so 
the substitution 


y 


y + (C\/A\)x 


puts E into the form 

(25.9.4) y 2 + A 2 xy + B 2 y = x 3 + C 2 x 2 . 

We note that the nonvanishing of the discriminant of (25.9.4) implies 
that B 2 ^ 0. Further, since IP = (-C2, A 2 C 2 - B 2 ), we see that 

3 P = 0 <=> 2 P = -P <=>• x 2 p = x P <=> C 2 = 0. 

Thus our assumption that 3P 7^ O implies that C 2 7^ 0, so we may make 
the substitutions 

xi — >(B 2 /C 2 ) 2 x and y 1 — » (B 2 /C 2 ) 3 y. 

This puts E into the desired form (25.9.2) with w = A 2 C 2 /B 2 — 1 and 
V = C\IB\. 

The formula for the discriminant of (25.9.2) follows directly from the 
general discriminant formula (25.3.7). 

In order to see that w and v are uniquely determined, we look at which 
change of variables (25.3.8) preserves the form of the equation (25.9.2) 
while simultaneously fixing the point (0, 0). The assumption that (0, 0) is 
fixed means that r = t = 0 in (25.3.8), and then the substitutions x —*■ u 2 x 
andy — *■ u 2 y + u 2 sx transform (25.9.2) into 

(25.9.5) y 2 + u~ l (w + 1 + 2s)xy + u~ 2 vy 

= x 3 4- u~ 2 (v + s 2 + (w + 1) s) x 2 + u~ 4 vsx. 

Comparing the x terms of (25.9.2) and (25.9.5) shows that 5 = 0 (note that 
v^0 since A 7^ 0), and then the y and x 2 terms show that u 3 = u 2 = I, 
sou = 1. Hence only the identity transformation preserves both equation 
(25.9.2) and the point (0, 0), and thus w and v are uniquely determined by 
E and P. □ 
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We now show that solving our moduli problem (25.9.1) is equivalent to 
describing the solutions to a certain polynomial equation. In other words, 
the set of pairs ( E , P ) consisting of an elliptic curve E and a point P of 
order N is naturally parametrized by the solutions of a polynomial equation 
* N (W, V)=0. 

Theorem 490. For any given values of w and v such that the discriminant 
(25.9.3) does not vanish, let E WfV be the elliptic curve 

(25.9.6) E w y.y 2 + (w + 1) xy + vy = x 3 + vx 2 

and let P WtV = (0, 0) € E w>v . Let N ^ 4 be an integer. 

(а) There is a nonzero polynomial ^n(W, V ) with integer coefficients 
having the property that P w<v is a point of order N if and only if 
4/^(w, v) = 0. 

(б) Let E be any elliptic curve given by an equation with coefficients in a 
field k and let Q € E(k) be a point of exact order N. Then there is a 
change of variables (25.3.8) with u, r, s, t € k that puts E into the form 
(25.9.6) and sends Q to P = (0, 0). The curve E and point Q uniquely 
determine w and v. 


Proof (a) We treat Ew,v as an elliptic curve over the field Q(fV, V) of 
rational functions in two variables. Then the coordinates of the multiples of 

Pw,v = (0,0) e Ew,v 


are quotients of polynomials in Q[fF, V\. More precisely, since the ring 
Q[1T, V ] has unique factorization, an argument similar to that used in 
Theorem 472 shows that if N Pwy f O, then we can write N Pw,v as 


NPwy 


n N (w,v)\ 

v) 2 'v n (w, v )V 


with ^> N ,<P N ,n N € Z [fV,Z]. 


The polynomial 'i'w(fV,V) vanishes at (W, V) = (w, v) if and only 
if Pw,v € E WtV is a point of order N, so it remains to prove that 
NPw,v f O. 

We first consider the multiple 
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From this formula for 4 Pwy we see that for most choices of integers w 
and v, the coordinates of the point 4P WjV are fractions that are not integers. 
For example, this is the case if |w| > 1 and gcd(2, v) = 1. It follows from 
Theorem 466 that for such integer values of w and v, the point 4 P WyV is not 
a point of finite order, and hence that nP WyV ^ O for all n ^ 1 . This implies 
that nPwy ^ O for all n ^ 1 when we treat W and V as indeterminates, 
since otherwise P WyV e E WyV would have finite order when we substitute 
particular values for W and V . 

(b) This is the special case of Theorem 489 in which we start with a point 
of finite order N ^ 4. □ 


Here are the polynomials 4>n(W, V) for some small values of N: 

4> S (W, V) = W - V, 

V 6 (W, V) = W 2 — W + V , 

V 7 (W, V) = W 3 - VW + V 2 , 

^ 8 (W, V) = VW 3 + W 3 -3VW 2 + 2 V 2 W, 

V 9 (W, V) = W 5 - W 4 + VW 3 + W 3 - 3 VW 2 + 3V 2 W - V 3 . 

The polynomials 4*5 and 4^6 are linear in V, so we can eliminate V from the 
equation 4 >n(W, V) = 0 and create a universal one-parameter family of 
elliptic curves with a point of order 5 or 6. For example, up to isomorphism, 
every elliptic curve with a point P of order 6 can be put into the form 

y 2 - H(w+ 1 )xy + (w — w 2 )y = x 3 4- (w — w 2 )x 2 , P — (0, 0). 

It is also possible to parametrize the solutions to 4>n(W, V) = 0 forN = 7, 
8, and 9. For example, the curve 4*7 ( W , V) = 0 may be parametrized using 
the parameter Z = V/W. Then W = Z — Z 2 and V = Z 2 — Z 3 , so every 
elliptic curve with a point P of order 7 can be put in the form 

y 2 + (1 + z — z 2 )xy -I- (z 2 — z 3 )y = x 3 + (z 2 — z 3 )x 2 , P = (0, 0). 

However, as the value of N increases, it is no longer possible to describe 
the solutions to (W,V) = 0 using a single parameter. The modular curve 
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X\ (N) is defined to be the plane curve given by the equation* 

X\(N) = {(w, v): 4w(w,v) = 0}. 

The increasing complexity of X\ (TV) as N increases may be measured by 
studying the points 0 CY 1 (N) having complex coordinates, i.e. the complex 
solutions to the equation 4W = 0- For N < 10 and N = 12, the complex 
points X\ (N)(C) form a sphere (a 0-holed torus),* and it is exactly in these 
cases that X\(N) isparametrizableby asingle parameter. The curves ^(11) 
and X\ (13) turn out themselves to be elliptic curves, so their complex points 
are 1 -holed tori. As N increases, the complex points X\ (N)(C) form a gw- 
holed torus, where the genus gjv goes to infinity with N. For prime values 
of N, the genus gv is approximately N/\2. 

Mazur used modular curves to prove the following strong uniformity 
bound for rational points of finite order on elliptic curves. 

Theorem 491*. Let E be an elliptic curve given by an equation with 
rational coefficients and let P G E(Q) be a point of exact order N. Then 
either N ^ 10 or N = 12. 

In order to prove Theorem 491, one shows that if N = 11 or N ^ 13, 
then the only solutions to 4W(w, v) = 0 in rational numbers w and v are 
solutions for which the discriminant (25.9.3) vanishes. Since such solutions 
(w, v) do not correspond to actual elliptic curves. Theorem 49 1 then follows 
from Theorem 490. The proof that 'I'vfw, v) = 0 has no nontrivial rational 
solutions requires a detailed analysis of the curve X\ (TV) and deep tools 
from modem algebraic geometry. 

25.10. Elliptic curves and Fermat’s last theorem. Fermat’s last the- 
orem, already alluded to in Chapter XIII, was stated by Fermat in the 17th 
century and proven by Andrew Wiles in the 20th. 

Theorem 492*. Let n^ 3 be an integer. Then the equation 

a n + b n = c n 

has no solutions in nonzero integers a, b, c. 

’ This definition of X\ (V) is not quite accurate, although it will suffice for our purposes. In general, 
the equation ^ = 0 has singularities and is missing points ‘at infinity.’ The correct definition of X\ (AO 
is that it is the desingularization of the compactification of the curve 4^ = 0. 

J For example, X\ (5)(C) is the compactification of the set {(w, v) € C 2 : w — v = 0). This set is a 
copy of the complex plane C, and the (one point) compactification of C is a two-dimensional sphere. 
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It clearly suffices to prove Theorem 492 for n — 4 and n — p an odd 
prime, and since Theorems 226 and 228 cover the cases n = 4 and n = 3, 
respectively, it suffices to prove that there are no solutions in nonzero 
integers to the equation 


(25.10.1) cf + b p = c p , where p ^ 5 is prime. 


Dividing by any common factor, we may further assume that a, b , and c 
are pairwise relatively prime. 

Setting u = ale and v = b/c, Fermat’s last theorem reduces to the 
statement that the equation 


(25.10.2) 


u p + v p = 1 


has no solutions in nonzero rational numbers u and v. This equation defines 
a curve, but it is most definitely not an elliptic curved So instead of working 
directly with (25. 1 0.2), we use a hypothetical solution to (25. 1 0. 1) to define 
an elliptic curve 


E aAc :Y 2 =X(X + a p )(X-b p ). 

Using the general discriminant formula (25.3.7) from § 25.3, we find that 
the discriminant of E a j,, c is* 

(25.10.3) A a ,b,c = \6a 2p b 2h (a p + If) 2 = 16 (abc ) 2p . 

An elliptic curve whose discriminant is (essentially) a perfect 2pth power 
would be a strange animal, indeed! The proof of Fermat’s last theorem lies 
in showing that such a curve cannot exist and comes down to proving the 
following two statements: 

• The elliptic curve E a ^ c is not modular. 

• The elliptic curve E a ^ c is modular. 

There are a number of equivalent definitions of what it means for an 
elliptic curve to be modular, but unfortunately, as bare definitions, they 
are not very illuminating. In keeping with the scope of this book, we 
give a definition that is purely algebraic, but we note that the underlying 
motivation lies in the analytic theory of modular forms and L-series. 

t The complex points of the compactified Fermat curve w" + v" = 1 form an -holed 

torus, so the Fermat curve is an elliptic curve only for n = 3. 
t After a simple change of variables, the discriminant (25.3.7) becomes simply (abc) lp . 
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For each N ^ 1 we defined in § 25.9 the modular curve X\(N) whose 
points classify pairs (C, P) consisting of an elliptic curve C and a point 
P of order N. (We call the elliptic curve C to distinguish it from E.) We 
now say that an elliptic curve E is modular if E can be covered by some 
modular curve, i.e. if there is a covering map 

(25.10.4) X\ (N) E 

defined by rational functions. The smallest N for which there exists a 
covering map (25.10.4) is called the conductor of E. 

After Frey suggested that the elliptic curves E a ^,c created from putative 
Fermat equation solutions should not be modular, Serre described a ‘level- 
lowering’ conjecture which implied that if E a ^ c were modular, then the 
special form (25.10.3) of its discriminant would force the conductor to 
divide 4. But the complex points of X\ ( N ) for A ^ 4 are spheres (0-holed 
tori), and a sphere cannot be continuously mapped onto the complex points 
of an elliptic curve (a 1 -holed torus). Ribet subsequently proved Serre ’s 
conjecture, which showed that Frey’s intuition was correct: the elliptic 
curve E ay b,c is not modular. 

It is not clear why this should be surprising. The points of X\ (N) solve 
a classification problem related to elliptic curves, but there is no reason, 
a priori, to expect any particular elliptic curve to admit a covering map 
from some X\ ( N ). However, earlier work of Eichler, Shimura, Taniyama, 
and Weil suggested that every elliptic curve given by an equation with 
rational coefficients should be modular, v 

Thus the final step in the proof of Fermat’s last theorem was to show 
that all, or at least most, elliptic curves are modular. This was done by 
Wiles, who, with assistance by Taylor for one step of the proof, proved 
that every semistable elliptic curve is modular, t Since the E a y, c curves, if 
they existed, would be semistable, this completed the proof of Fermat’s 
last theorem. Building on Wiles’ work, Breuil, Conrad, Diamond, and 
Taylor subsequently completed the proof of the full modularity conjecture, 
whose proof is far beyond the scope of this book. 

Theorem 493*. Every elliptic curve given by an equation with rational 
coefficients is modular. 


t Aside from some special conditions at 2 and 3, an elliptic curve Y 2 = X 2 + AX + B is semistable 
ifgcd04,fl) = l. 
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NOTES 

§25.1. Some cases of rational right triangles with rational area were studied in ancient 
Greece, but the systematic study of congruent numbers began with Arab scholars during 
the 10th century. Arab mathematicians tended to use the equivalent characterization, also 
known to the Greeks, that n is a congruent number if and only if there is a rational number 
x such that both x 2 + n and x 2 — n are squares of rational numbers. See Dickson History, 
ii, ch. xvi, for additional information on the mathematical history of congruent numbers. 

There exists a vast literature on elliptic curves, t including many textbooks devoted to 
their number theoretic properties. The reader may consult the books of Cassels, Knapp, 
Koblitz, Lang, Silverman, and Silverman-Tate for proofs of the unproven theorems in this 
chapter (other than those in §§ 25.8-25. 10) and for much additional basic material. 

§ 25.2. The genesis of the name ‘elliptic curve’ is from the integrals that arise when 
computing the arc length of an ellipse. After an algebraic substitution, such integrals take 

the form f R(x)dx/y/x* + Ax + B for some rational function rt(x). These elliptic integrals 
may be viewed as integrals f R(x)dx/y on the curve (Riemann surface) y 2 = x 3 + Ax + B, 
hence the name elliptic curve. 

Special cases of the duplication and composition law on elliptic curves, described alge- 
braically, date back to Diophantus, but it appears that the first geometric description via 
secant lines is due to Newton, Mathematical Papers, iv, 1674-1684, Camb. Univ. Press, 
1971, 110-1 15. A nice historical survey of the composition law is given by Schappacher, 
Sem. Theor. Nomb . Paris 1988-1989, Progr. Math . 91 (1990), 159-84. 

A proof that addition on an elliptic curve is associative (Theorem 462(c)) may be found 
in the standard texts listed earlier. 

Theorem 463 was first observed by Poincar6, Jour Math. PuresAppl. 7 (1901). 

Elliptic curves with complex multiplication have many special properties not shared 
by general elliptic curves. In particular, if the endomorphism ring of such a curve E is a 
subring of the quadratic imaginary field k, then Abel, Jacobi, Kronecker,... proved that the 
coordinates of the points of finite order in E can be used to generate abelian extensions 
of k that are natural analogues of the cyclotomic extensions of Q, i.e. the extensions of <Q> 
generated by roots of unity. In particular, k(J(E)) is the Hilbert class field of k, the maximal 
abelian unramified extension of k . 

§ 25.3. It is easy to create a Weierstrass equation that is minimal except possibly for 
the primes 2 and 3. An algorithm of Tate ( Lecture Notes in Math . (Springer), 476 (1975), 
33-52) handles all primes. 

§ 25.4. Theorem 466 was proven independently by Nagell ( Wid Akad. Shifter Oslo I t 
1 (1935)) and Lutz (J. Reine Angew. Math . 177 (1937), 237-47). The proof that we give 
follows Tate’s 1961 Haverford lectures as they appear in Silverman-Tate, Rational points 
on elliptic curves . 

A modem formulation of Theorem 469 says that the group of p-adic points E(Q p ) has a 
filtration by subgroups E k (Qp) = {(z,w) e E (Qp) : v p (z) ^ *} for* = 1,2, ... . Further, 

the map P \-+ zp induces an isomorphism £*(Qp)/£*+i(Qp) -*■ The groups 

E\(Q P ) and pZ p are isomorphic as p-adic Lie groups via a map P (-► l p (zp) , where 
t p (T) e Qp DTJ is a certain /radically convergent power series. 

See also Theorem 491 and the notes for Section 25.9 for uniform bounds for points of 
finite order. 

§ 25.5. Theorem 470 is due to Mordell, Proc Camb. Philos. Soc., 21 (1922), 179-92. 
It was generalized by Weil (Acta Math. 52 (1928), 281-315) to number fields and to 

t MathSciNet lists almost 2000 papers whose title includes the phrase ‘elliptic curve’. 
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abelian varieties (higher dimensional analogues of elliptic curves), and thus is known as 
the Mordell-Weil theorem. Theorem 475, or more generally the finiteness of the quotient 
E(Q)/mE(Q ) for all m ^ 1, is called the ‘weak’ Mordell-Weil theorem. The structure 
theorem for finitely generated abelian groups is well-known and may be found in any basic 
algebra text. 

It is conjectured that there are elliptic curves for which E(Q) has arbitrarily high rank. 
The largest known example is a curve of rank at least 28 that was discovered by Elkies in 
May 2006. (See Elkies survey article arxiv.org/abs/0709.2908). 

Somewhat surprisingly, there is still no proven algorithm for computing the group of 
rational points on an elliptic curve. All known proofs of Theorem 475 are ineffective in 
the sense that they do not provide an algorithm for constructing a suitable set of points 
covering all of the congruence classes in the finite quotient group E(Q)/2E(Q). 
If such points are known, then the remainder of the proof of Theorem 470 is effective, since 
the constants in Theorem 476 may easily be made effective. There is also an algorithm, 
due to Manin ( Russian Math. Surveys , (6) 26 (1971), 7-78), that is effective conditional on 
various standard, but very deep, conjectures. In practice, there are powerful computer 
programs, such as Cremona’s mwrank (www.maths.nott.ac.uk/personal/jec/mwrank/), 
that are usually able to compute generators for E( Q) if the coefficients of E are not 
too large. 

Theorem 476 suggests that the height function h : E( Q) -> [0, oo) resembles a quadratic 
form. Neron (Ann. of Math. (2) 82 (1965), 249-331) and Tate (unpublished) proved that 
the limit h(P) = limn-^oo n~ 2 h(nP) exists, differs from h by 0(1), and is a quadratic form 
on E(Q) whose extension to E (Q) <g> R is nondegenerate. The function A, which is called 
the canonical (or Neron-Tate) height , has many applications. For example, Neron (op. cit.) 

showed that # [P e E (Q) :h (P) s$ T) ~ C E .T */ 2 rank E as T -► oo. 

§ 25.6. Theorem 477 is due to Hasse, Vorlaufige Mitteilung, Nachr. Ges. Wiss. Gottin- 
gen I, Math.-Phys. Kl. Fachgr. I Math. 42 ( 1 933), 253—62. A vast generalization to varieties 
of arbitrary dimension was proposed by Weil (Bull. Amer. Math. Soc. 55 (1949), 497-508) 
and proven by Deligne (IHES Publ. Math. 43 ( 1 974), 273-307). 

It is an interesting computational problem to compute #E(F p ) when p is large. The first 
polynomial time algorithm is due to Schoof (Math. Comp. 44 (1985), 483-94), who also 
used it to give the first polynomial time algorithm for computing square roots in F p . A more 
practical version, although not provably polynomial time, was devised by Elkies and Atkins 
and is now known as the SEA algorithm (J. Theor Nombres Bordeaux, 1 (1995), 219-54). 
Satoh (J. Ramanujan Math. Soc. 1 5 (2000), 247-70) used cohomological ideas to give a 
faster algorithm to count #E(F q ) when q is a large power of a small prime. Such point 
counting algorithms have applications to cryptography. 

Given two points P and Q in E(F p ) such that Q is a multiple of P, the problem of 
determining an integer m with Q = mP is called the elliptic curve discrete logarithm 
problem (ECLDP). The fastest known algorithms for solving the ECDLP are collision 
algorithms that take 0( y /p) steps. These exponential-time algorithms may be contrasted 
with the subexponential index calculus, which solves the analogous problem for F* in 

O ^ e c ( |o 8P) / (log log p) 1 j s t e p Si The lack of an efficient algorithm to solve the ECDLP 

led Koblitz (Math. Comp. 48 (1977), 203—9) and V. Miller (Lecture Notes in Comput. Sci. 
(Springer), 218(1 986), 4 1 7-26) independently to suggest the use of elliptic curves for the 
construction of public key cryptographic protocols. Thus in addition to any purely intrinsic 
mathematical interest that the ECDLP might inspire, the existence or nonexistence of faster 
algorithms to solve the ECDLP is of great practical and finanical importance. 

§ 25.7. Theorem 478 is due to V.A. Lebesgue ( 1 869) and Theorem 479 is due to Fermat. 
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Theorem 483 is due to Siegel (J. London Math. Soc. 1 (1926), 66-68 and Collected 
Works f Springer, 1966, 209-66), who gave two different proofs, neither of which provided 
an effective bound for the size of the solutions. This was remedied by Baker (J. London 
Math. Soc. (1968) 43, 1-9), whose estimates for linear forms in logarithms ( Mathematika 
13 (1966), 204-16; 14 (1967), 102-7; 14 (1967), 220-8) provide effective Diophantine 
approximation estimates that can be used to prove effective bounds for integer points on 
elliptic curves. Building on work of Vojta (Ann. of Math. 133 (1991), 509-48), Faltings 
(Ann. of Math. 133 (1991), 549-76) generalized Siegel’s theorem by proving that an affine 
subvariety of an abelian variety has only finitely many integral points. 

It is trivial to produce Weierstrass equations (25.1.3) having arbitrarily many integer 
solutions by clearing the denominators of rational solutions. Using this method, Silverman 
(J. London Math. Soc. 28 (1983), 1-7) showed that if there exists an elliptic curve E 
whose group of rational points E( Q) has rank r, then there exist infinitely many Weierstrass 
equations (25.1.3) having (log max {\A\, \B\}) r ^ r+2 ^ integer solutions. 

Lang (Elliptic Curves : Diophantine Analysis, Springer, 1978, page 140) conjectured 
that the number of integer points on a minimal Weierstrass equation should be bounded by 
a quantity depending only on the rank of the group of rational points. This conjecture was 
proven for elliptic curves with integral y-invariant by Silverman (J. Reine Angew. Math. 
378 (1987), 60-100) and, conditional on the aftc-conjecture of Masser and Oesterl6 (see 
notes to ch. XIII), for all elliptic curves by Hindry and Silverman (Invent. Math. 93 (1988), 
419-50). 

§ 25.8. The quantity a p defined by (25.8.1) is called the trace of Frobenius, because it 
is the trace of the p-power Frobenius map in the Galois group Gal(Q/Q) acting as a linear 
map on the group of points of /-power order in E , where / is any prime other than p. 

A conjecture of Sato and Tate (independently) describes the variation of a p , and thus of 
#E(F p ) 9 as p varies. Theorem 477 says that there is an angle 0 ^ G p ^ ^ such that 
cos 0 p = a p !2Jp. The Sato-Tate conjecture asserts that for 0 ^ a < ^ j 9 the density 

of {p:a ^ 0 p ^ ft} within the set of primes is ~ sin 2 (t) dt. Taylor (IHESpubl. Math. 
submitted 2006), building on earlier joint work with Clozel and M. Harris (IHES Publ. 
Math, submitted 2006) and with M. Harris and Sheppard-Barron (Ann. of Math, to appear), 
has proven the Sato-Tate conjecture for elliptic curves whose y-invariant is not an integer. 

Theorem 487 was proven by Deuring (Nachr. Akad. Wiss. Gottingen. Math.-Phys. Kl. 
Math.-Phys.-Chem. Abt. (1953), 85-94) for elliptic curves with complex multiplication, by 
Wiles (Ann. of Math. 141 (1995), 443-551), with assistance from Taylor (Ann. of Math. 141 
( 1 995), 553-72), for semistable eliptic curves (roughly, curves given by an equation (25. 1 .3) 
with ged (A 9 B) = 1), and in full generality by Breuil, B. Conrad, Diamond, and Taylor, J. 
Amer. Math. Soc. 14 (2001), 843-939. See § 25.10 and its notes for the connections with 
Fermat’s last theorem. 

The conjecture that ord 5== iZ,CE,.s) = rank E( Q), and a refined version describing the 
leading Taylor coefficient of L(E 9 s) at 5= 1, were proposed by Birch and Swinnerton- 
Dyer (J. Reine Angew. Math. 218 (1965), 79-108). An early partial result of Coates 
and Wiles (Invent. Math. 39 (1997), 223-51) showed that if E has complex multiplica- 
tion and if L(E 9 1) ^0, then £(Q) is finite. Theorem 488 is an amalgamation of work 
of Gross and Zagier (Invent. Math. 84 (1986), 225-320) and Kolyvagin (Izv. Akad. 
Nauk SSSR Ser. Mat. 52 (1988), 522-40, 670-1), combined with Wiles’ et al. proof 
of the Modularity Conjecture (essentially Theorem 487). The conjecture of Birch and 
Swinnerton-Dyer is one of the seven Millennium Problems proposed by the Clay Mathe- 
matics Institute (www.claymath.org/millennium/). Gross and Zagier (op. c/7.) further show 
that if L(E 9 1 ) = 0 and l!(E 9 1)^0, then L!(E 9 1 ) = rQh(P) 9 where r € Q, ft is the value of 
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an elliptic integral, and h{P) is the canonical height of a point P eE{Q) constructed using 
a method due to Heegner. 

A weak form of the Birch-Swinnerton-Dyer conjecture implies that every integer 
m = 5,6, 7 (mod 8) is a congruent number. Assuming the same weak form of the Birch- 
Swinnerton-Dyer conjecture, Tunnell {Invent. Math. 72 (1983), 323-34) proved that if m 
is a squarefree odd integer and if the number of integer solutions to 2x 2 4- y 2, + 8 z 2 = m 
is twice the number of integer solutions to 2x 2 4- y 2 4- 32 z 2 = m, then m is a congruent 
number. He also showed that the converse holds unconditionally, and that similar results 
hold for squarefree even integers. 

§ 25.9. The analytic theory of modular curves and modular functions was extensively 
studied starting in the 19th century (see, e.g., Kiepert, Math. Ann. 32 (1888), 1-135 and 
37 (1890), 368-98) and continues to the present day. We have taken a purely algebraic 
approach, but the reader should be aware that in doing so, we have missed out on much of 
the theory. 

The history of Theorem 491 is quite interesting. Beppo Levi {AttiAccad. Sci. Torino 42 
(1906), 739-64 and 43 (1908), 99-120, 413-34, 672-81) computed equations of various 
modular curves X\ (N) and proved that X\ {N) has no nontrivial rational points for N = 14, 
16, and 20, thereby showing that no elliptic curve can have a rational point of these orders. 
Prime values of N are more difficult, with AT = 1 1 being handled by Billing and Mahler (J. 
London Math. Soc. 15 (1940), 32-43), N = 17 by Ogg {Invent. Math. 12 (1971), 105-11), 
and N = 13 by Mazur and Tate {Invent. Math. 22 (1973), 41-9). Mazur then proved the 
general result (Theorem 491) in IHES Publ. Math. 47 (1978), 33-186. 

Mazur’s theorem was extended to quadratic number fields by Kamienny {Invent. Math. 
109 (1992), 221-9), to number fields of degree at most 8 by Kamienny and Mazur, and 
to number fields of degree at most 14 by Abramovich. Merel {Invent. Math. 124 (1996), 
437-49) then proved uniform boundedness for all number fields. Merel’s theorem states 
that a point of finite order in E{k) has order bounded by a constant depending only on the 
degree of the number field k. 

§ 25.10. After earlier work by Frey, Hellegouarch, Kubert, and others relating Fermat 
curves and modular curves, Frey {Ann. Univ. Sarav. Ser. Math. 1 (1986), iv4-40) suggested 
that the E a ^ c curves should not be modular. Serre {Duke Math. J. 54 (1987), 179-230) 
formulated a conjecture on modular representations that implies Frey’s conjecture. Ribet 
{Invent. Math. 100 (1990), 431-76) then proved Serre ’s conjecture, thereby showing that 
Ea 9 b,c * s n °t modular. 

Despite their strikingly different statements. Theorem 487 on the analytic continuation 
of L-series and Theorem 493 on the modularity of elliptic curves are closely related to one 
another via the theory of modular forms. Work of Eichler {Arch. Math. 5 (1954), 355-66), 
Shimura {J. Math. Soc. Japan 10 (1958), 1-28), and Weil {Math. Ann. 168 (1967), 149-56) 
shows that, up to some technical conditions, the two theorems are equivalent. Thus the 
history of the proof of Theorem 487, which is described in the notes to § 25.8, is equally 
the history of the proof of Theorem 493. 

For a brief, but technical, overview of the proof of Fermat’s last theorem, see Stevens, 
Modular forms and Fermat's last theorem , Springer, 1997, 1 15. And for the enterprising 
reader, the remaining 550+ pages of this instructional conference proceedings provide 
further details of the many pieces that fit sungly together to form a proof of this famous 
350-year-old problem. 
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1. Another formula for p„. We can use Theorem 80 to write down a 
formula for n(n) and so one for p n . These formulae do not suffer from the 
disadvantage of those described in § 22.3. In theory, they could be used 
to calculate n(n) and p„, but at the cost of much heavier calculation than 
the Sieve of Eratosthenes; indeed the calculation is prohibitive except for 
fairly small n. It follows from Theorem 80 that 

O' - 2)! = a (mod 7 ), (J ^ 5) 

where a = 1 or 0, according as j is a prime or composite. Hence we have 

= 2 + ^2 { O' - 2)! - j [~ j 2 ^ j } ^ 5 >» 

while tt( 1 ) = 0 , tt( 2 ) = 1 , and tt ( 3 ) = ;r( 4 ) = 2 . 

We now write 

f{x, x) = 0, f(x,y) = j J 1 + j (x jz y ), 

so that f(x,y) = 1 or 0 according as x > y or x ^ y. Then / (n, 7 t(j)) - 0 
or 1 according as n ^ 7rO) or n > i.e. as j ^ p n or j < p n . But p n < 
2” by Theorem 418. Hence 

2 ” , Pn- 1 

1 + ^fi^nij)) =1+^1 =Pn. 

7=1 7=1 

This is our formula for p n . 

There is a considerable literature on formulae for primes of various kinds. 
See, for example, Dudley ( American Math. Monthly 76 (1969), 23-28), 
Golomb (ibid. 81 (1974), 752-4) and Gandhi’s review of the latter paper 
{Math. Rev. 50 (1975), 963), which give further references. 

2. A generalization of Theorem 22. Theorem 22 can be generalized 
to a larger number of variables. Thus suppose that Pi(x\, . . . ,£*) and 
Qi(x i, . . . ,**) are polynomials with integer coefficients, that 

are positive integers and that 


m 


F = F{x i, . . . ,Xk) = ^2>Piix i, . . . , x k )a ? i(xu '"' Xk) . 


»=1 
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If F takes only prime values for all possible non-negative values of 
*1 > • . • > Jt*, then/* must be a constant. On the other hand, Davis, Matijasevic, 
Putnam, and Robinson have shown how to construct a polynomial 
R(x\ , . . . ,Xk), all of whose positive values are prime for non-negative inte- 
gral values of x\ , . . . ,;c* and for which the range of these positive values 
is precisely the primes, but all of whose negative values are composite. 
With k = 42, the degree of R need be no more than 5. The least value so far 
found for k is 10, when the degree of R is 15905. See Matijasevic, Zapiski 
naucn, Sent. Leningrad. Otd. mat. Inst. Steklov 68 ( 1 977), 62-82 (Russian, 
English summary) for this last result and Jones, Sato, Wada, and Wiens, 
American Math. Monthly 83 (1976), 449-65 for an account of this whole 
topic and full references. 

3. Unsolved problems concerning primes. Apart from the correction 
of a trivial error, the unsolved problems listed in § 2.8 are the same as those 
listed in the first edition (1938) of this book. None of these conjectures has 
been proved or disproved in the intervening 70 years. But there have been 
substantial advances towards their proof and we describe some of them 
here. 

Goldbach enunciated his ‘theorem’ (mentioned in § 2.8) that every even 
n > 3 is the sum of two primes in a letter to Euler in 1742. Vinogradov 
proved in 1937 that every sufficiently large odd number is the sum of three 
primes. Estermann, Introduction, gives Vinogradov’s proof. Let E(x) be 
the number of even integers less than x which are not the sum of two primes. 
Estermann, van der Corput, and Chudakov proved that E(x) = o(x) and 
Montgomery and Vaughan (Acta Arith. 27 (1975), 353-70) improved this 
to E(x) = 0(x 1-3 ) for a suitable S > 0. See this last paper for references. 
Ramare (Ann. Scuola Norm. Sup. Pisa Cl. Sci. (4) 22 (1995), 645-706) has 
shown that every positive integer is a sum of at most 6 primes. As of 2007, 
it has been verified that the Goldbach hypothesis is true for n ^ 5 x 10 17 
(Oliveira e Silva, see http://www.ieeta.pt/tos/goldbach.html). 

Let us write Pi to denote any number that is a prime or the product of 
two primes. Chen has proved that every sufficiently large even number is 
a sum of a prime and a Pi (see Ross, J. London Math. Soc. (2) 10 (1975), 
500-506 for the simplest proof) and also that there are infinitely many 
primes p such that p -f 2 is a Pi. There is a Pi between n 2 and (n + l) 2 
(Chen, Sci Sinica 18 (1975), 61 1-27) and there is a prime between n — n e 
and n, where 6 = 0.525 (Baker, Harman, and Pintz, Proc. London Math. 
Soc. (3) 83 (2001), 532—562). All the results mentioned in this paragraph 
have been found by the modem sieve method; see Halberstam and Roth, 
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ch. 4 for an elementary exposition and Halberstam and Richert for a fuller 
treatment. 

Friedlander and Iwaniec (Ann. of Math. (2) 148 (1998), 945—1040) have 
shown that there are infinitely many primes of the form a 2 + b A . Similarly 
Heath-Brown (Acta Math. 186 (2001), 1—84) has shown that there are 
infinitely many primes of the shape a 3 + 26 3 . This latter result has been 
extended to arbitrary binary cubic forms by Heath-Brown and Moroz (Proc. 
London Math. Soc. (3) 84 (2002), 257-288). Results of this type give the 
sparsest polynomial sequences currently known to contain infinitely many 
primes. It would be very interesting to have a similar result for primes 
of the shape 4a 3 + 27 b 1 , since this would show that there are infinitely 
many cubic polynomials with integer coefficients and prime discriminant. 
It would also resolve the open conjecture that there are infinitely many 
non-isomorphic elliptic curves defined over the rationals and having prime 
conductor. 

It follows from the Prime Number Theorem that for numbers around x the 
average gap between consecutive primes is asymptotically logx. However 
it is known that gaps which are much smaller, and much larger, can occur. 
On the one hand, Goldston, Pintz, and Yildinm, (in work still to appear, as 
of 2007) have shown that 

Hminf ^' ~ P " =0. 

n^oo log Pn 

and even that 


urn ml — 7-pr- =- < oo. 

/i-*0O (log/?„) 1 /2(loglog/7 n ) 2 

In the other direction Pintz (J. Number Theory 63 (1997), 286-301) has 
proved that there are infinitely many primes p n for which 


Pn+ 1 -Pn ^ 2(e y + O(l)) log Pn 


(log log /?n) (log log log log Pn) 
(log log log/?*) 2 


(where y is Euler’s constant). 

One of the most remarkable recent results on primes is due to Green and 
Tao (Annals of Math, to appear), and states that the primes contain arbitrar- 
ily long arithmetic progressions. The longest such progression currently 
known (2007) has length 23, and consists of the primes 


56211383760397 + 4454673 8095 860A: (k = 0,2, . . . ,22), 


found by Frind, Underwood, and Jobling. 
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km 
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p. 264 
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§5.5 
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primitive root of unity 

§5.6 
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§6.8 

p. 89 

primitive root of m 

§6.8 
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minimal residue (mod m) 

§6.11 
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Euclidean number 

§ 115 

p. 204 
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p. 204 

algebraic field 
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p. 264 

simple field 

§ 14.7 

p. 274 

Euclidean field 

§ 14.7 

p. 274 

squarefree 

§ 17.8 

p. 335 

linear independence of numbers 

§23.4 

p. 510 
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Chowla, 137 

Chudakov, 594 

Cipolla, 101 

Clausen, 119 

Cook, 446 

Copeland, 164 

van der Corput, 359, 499, 594 
Coxeter, 26, 27, 595 


Darling, 391 
Darlington, 137 

Davenport, 27, 77, 445, 446, 448, 

544-8, 595 
Davis, 594 
Dedekind, 503, 596 
Democritus, 50 

Diamond, 44, 497, 499, 588, 591 
Dickson, 12, 26, 44, 101, 137, 164, 197, 
260-2, 281, 316, 317, 359, 390, 391, 
416, 417, 444, 447-50, 497, 546, 

589, 595 

Diophantus, 261, 589 
Dirichlet, 16, 22, 77, 119, 146, 201, 202, 
217, 227, 318, 320, 323, 329, 338, 339, 
341, 359, 417, 501, 579, 581, 596 
Dress 447, 448 
Dudley, 593 
Duparc, 102 
Durfee, 371, 372 

Dyson, 227, 228, 383, 391, 392, 547, 576 


Edwards, 261, 341 
Eisenstein, 77, 137, 244 
Elkies, 450, 590 
Enneper, 390 

Eratosthenes, 4, 6, 7, 13, 593 

Erchinger, 77 

Erdds, 27, 119,498, 545 

Errera, 499 

Escott, 449 

Estermann, 417, 514, 522, 594, 596 
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Euclid, 3, 5, 12, 14, 15, 17, 20, 22, 25, 47, 
50, 71, 172, 174, 204, 227, 231, 232, 
234, 238, 239, 241, 244, 274-81, 293, 
299, 300, 301, 311, 312, 405, 451, 547, 
564, 570, 575 
Eudoxus, 47 

Euler, 18, 19, 27, 63, 77-9, 81, 100, 101, 
102, 258, 260-2, 285, 316, 317, 320, 
341, 363, 366, 367, 371, 376, 378, 380, 
382, 390, 391, 416, 440, 449, 450, 

498, 594 


Farey, 28, 36-7, 44, 354 
Fauquembergue, 450 
Fermat, 7, 17, 18-20, 21, 23, 72, 77, 
78-102, 108-11, 116, 135, 245, 247, 
248, 249, 261-2, 263, 285-6, 288, 395, 
397, 440, 450, 550, 565, 567, 586-8, 
590-2, 598 
Ferrar, 526 
Ferrier, 1 9, 27 

Fibonacci, 190, 192, 197, 290 

Fleck, 449 

Franklin, 379, 391 

Froberg, 102 

Frye, 450 

Fuchs, 449 


Gandhi, 593 

Gauss, 12, 17, 46, 55, 66, 71, 72, 77, 78, 
92-5, 102, 137, 230, 235, 238, 244, 317, 
359, 390, 391, 401, 417, 531, 546, 

556, 596 
Gegenbauer, 359 
Gelfond, 55, 227, 228, 576 
Gerardin, 263, 450 
Glaisher, 137,417, 497 
Gloden, 449 
Goldbach, 23, 27, 594 
Goldberg, 101 
Golomb, 593 
Golubew, 500 
Grace, 398,416 
Grandjot, 77 
Gronwall, 359 
Gruenberger, 12 
Grunert, 164 
Gupta, 383, 390, 391 
Guy, 27 
Gwyther, 391 


Hadamard, 13,499 
Hajos, 44, 545 
Hal&sz, 360 

Halberstam, 416, 594, 596 
Hall, 44, 498 
Hallyburton, 26 
Hamilton, 4 1 6 

Hardy, 137, 204, 216, 341, 359, 383, 391, 
417 , 444-9, 498 , 499, 522, 596 
Haros, 44 

Hasse, 27, 573, 590, 596 
Hausdorff, 164 
Heaslet, 417, 597 
Heath, 50, 55, 261,360 
Hecke, 27, 119, 204, 596 
Heilbronn, 274, 281, 445, 547 
Hermite, 56, 228, 416, 545 
Hilbert, 228, 394, 416, 444, 445, 448, 589, 
596 

Hlawka, 548 
Hobson, 164, 227 
Holder, 316 
Hua, 262 
Hunter, 449 

Hurwitz, Adolf, 44, 102, 228, 416, 417, 
449, 545 

Hurwitz, Alexander, 19 
Huxley, 44, 359, 360 


Ingham, 13, 26, 301, 341, 497, 595, 596 
Iwaniec, 417, 594 


Jacobi, 244, 317, 341, 372, 375, 377, 382, 
390, 391,416,417, 589 
Jacobstal, 137 
James, 445, 446 
Jensen, 77 
Jessen, 522 
Jones, 13, 594 
Jung, 596 


Kac, 498 
Kalmar, 497 
Kempner, 444, 445, 449 
Khintchine, 228 
Kishore, 316 
Kloosterman, 68, 77 
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Knorr, 55 

Koksma, 228, 522, 544, 546, 596 
Konig, 164, 505, 522 
Korkine, 546 
Kraitchik, 13, 26, 27 
Kredmar, 383, 391 
. Kronecker, 77, 501-22, 589 
Kubilius, 498 
Kuipers, 164, 522, 596 
Kununer, 261 


Lagrange, 110, 119, 126, 197, 255, 399, 
416 
Lai, 13 

Lambert, 55, 339 

Landau, 13, 26, 44, 77, 102, 228, 261, 262, 
301, 316, 341, 359, 360, 417, 444, 445, 
497, 499, 545, 547, 595, 596 
Lander, 450 
Landry, 18 
Lebesgue, 164, 590 
Leech, 263, 450, 500 

Legendre, 78, 85, 101, 102, 262, 416, 417, 
424, 573 

Lehmer, D. H. 12, 19, 26, 102, 190, 197, 
301,383, 391,500 
Lehmer, D. N. 12, 497 
Lehmer, E. 500 
Lehner, 383, 391 
Leibniz, 101 

Lekkerkerker, 545, 548, 596 
Ldtac, 449 

Lettenmeyer, 512, 514, 522 

Leudesdorf, 130, 137 

LeVeque, 101, 597 

Lindemann, F. 228 

Lindemann, F. A. (see Cherwell) 27 

Linfoot, 281 

Linnik, 444 

Liouville, 206, 22?, 417, 448 
Lipschitz, 416, 417 

Littlewood, 13, 26, 444-9, 499, 522, 597 
Lucas, 12, 19, 20, 26, 102, 190, 290, 

293, 301 


Macbeath, 546, 548 
McCabe, 5 1 , 55 
Maclaurin, 115 

MacMahon, 368, 379, 380, 383, 390, 391, 
597 


Mahler, 447, 450, 546, 592 
Maillet, 449 
Manin, 262, 590 
Mapes, 12 
Markoff, 546 
Mathews, 77 
Matijasevic, 594 

Mersenne, 18-21, 26, 27, 100, 101, 190, 
261,290, 291,311,312 
Mertens, 359, 466, 498 
Miclavc, 522 

Miller, 19, 20, 26, 102, 391, 590 
Mills, 498 

Minkowski, 37-44, 417, 523, 534, 540, 
544, 545, 547, 548, 597 
Mobius, 304, 305, 316, 328-30, 479 
Moessner, 450 
Montgomery, 301, 594 
Mordell, 40, 44, 261, 262, 391, 417, 434, 
449, 450, 523, 545, 546, 548, 564, 589, 
590, 597 
Moser, 498 
Mullin, 244 


Nagell, 559, 589, 597 
Napier, 9 

Narasimkamurti, 446 
Netto, 390 
vorf Neumann, 164 
Nevanlinna, 499 
Newman, 301, 380 
Newton, 435, 589 
Nickel, 19 

Niederreiter, 164, 522, 596 
Niven, 56, 164, 447, 597 
Noll, 19 
Norrie, 450 


Olds, 197, 597 
Oppenheim, 55, 547 
Ore, 597 


Pal ami, 449 
Parkin, 450 
Patterson, 450 
Pearson, 101, 102 
Pell, 281 

Perron, 197, 228, 597 
Pervusin, 19 
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Pillai, 447 
Pintz, 13, 594 
Plato, 50, 51 
vanderPol, 316 
de Polignac, 497 

P61ya, 17, 26, 44, 164, 316, 341, 359, 499, 
595, 597 
Prachar, 597 
Prouhet, 435, 449 
Putnam, 594 

Pythagoras, 46, 47, 50, 55, 261 


Rademacher, 44, 383, 597 
Rado, 55, 119, 545 

Ramanujan, 67, 68, 77, 260, 308, 316, 336, 
341, 350, 359, 380, 382, 383, 385, 
389-92, 417, 498, 590, 596, 598 
Rama Rao, 119, 137 
Reid, 281 
Remak, 547 
Ribenboim, 27, 261 
Richert, 594, 596 
Richmond, 77, 262, 434, 449 
Riemann, 320, 341, 581, 589 
Riesel, 19, 594 
Riesz, 341 
Robinson, J. 594 
Robinson, R. M. 19, 102 
Rogers, 383, 385, 391, 392, 548, 597 

Roth! 227, 576, 594, 5%, 597 
Rubugunday, 447 
Ryley, 262 


Saltoun, 416 
Sambasiva Rao, 446 
Sastry, 450 
Sato, 591, 594 
Schmidt, 227 
Schneider, 228 
Scholz, 444, 597 
Schur, 385, 391, 449 
Seelhoff, 19 
Segre, 262 

Selbeig, A. 392, 478, 498, 499 
Selbeig, S. 499 
Selfridge, 19, 27, 102 
Shah, 499 
Shanks, 597 

Siegel, 227, 545, 574, 576, 578, 591 


Sierpinski, 392 
Skolem, 390 
Skubenko, 547 

Silverman, 567, 589, 591, 598 

Smith, 417, 597 

Sommer, 281, 597 

Staeckel, 499 

Stark, 197, 281,597 

von Staudt, 115, 116, 119 

Stewart, 261 

Subba Rao, 450 

Sudler, 392 

Sun-Tsu, 137 

Swinnerton-Dyer, 263, 383, 391, 444, 450, 
547, 581,582, 591,592 
Szeg6, 26, 316,641 
Sziics, 505 


Tarry, 435, 449 

Tate, 567, 589-92, 598 

Taylor, 219, 261, 262, 588, 591 

Tchebotaref, 537, 548 

Tchebychef, 11,13, 497, 498, 522 

Theodorus, 50, 51, 55 

Thue, 227, 576 

Titchmarsh, 341 

Toeplitz, 597 

Tong, 446 

Torelli, 497 

Tuckerman, 19, 26, 293 
Turin, 498 


Uspensky, 417, 597 


de la Vallie-Poussin, 13, 499 
Vandiver, 261 

Vaughan, 446, 447, 448, 594, 597 
Vieta, 262, 450 

Vinogradov, 445, 446, 448, 499, 594, 597 
Voronoi, 359 


Wada, 594 

van der Waerden, 55 

Wall, 197 

Waring, 101, 1 19, 393, 394, 416, 419, 431, 
444-7, 449 
Watson, G. L. 444 

Watson, G. N. 383, 385, 391, 467, 526, 548 
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Weber, 545 

Weil, 77, 564, 588, 589, 590, 592 
Wellstein, 545 
Western, 301, 444 
Weyl, 522 

Wheeler, 19, 20, 26, 102 
Whitehead, 102, 390 
Whitford, 281 
Whittaker, 467, 526 
Wieferich, 261, 444, 445, 448 
Wiens, 594 
Wigert, 359 
Wilson, B. M. 359 

Wilson, J. 85, 101, 109-11, 119, 132, 
135, 137 
Wirsing, 360 


Wolstenholme, 112-14, 119, 130, 133, 134 

Woods, 545, 547, 548 

Wright, 102, 137, 390, 449, 498, 499 

Wunderlich, 27, 447 

Wylie, 137, 228 


Young, G. C. 164 
Young, W. H. 164 


Zassenhaus, 545 
Zermelo, 27, 164 
Zeuthen, 50-2 
Zolotareff, 546 
Zuckerman, 164, 316 
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Note : References to footnotes are denoted 
by (f.n.) after the page number. 

Some symbols which have special- 
ized meanings, or which are easily 
confused, are included at the begin- 
ning of this index. 

— ► [implies] vi 
— ► [tends to] vi 
= [logically equivalent] vi 
= [congruent] vi, 58, 1 03—4 
. [and] vi, 2 
0, o, x 7-8 

- 9 

* 16 (f.n.) 

(?) 85 

[x] [integer part] 93 

[ao, ■ ■ ., an] [continued fraction] 1 65 

(x) 201 

x 201 

[a, P] [basis for lattice] 295 
[p] [class of multiples] 296 


additive theory of numbers 254, 338, 361 
aggregates, theory of 227 
algebraic equation 203 
algebraic field 264 
see also I fc(tf) 
algebraic integer 229, 265 
algebraic number 203-4, 204 (f.n.), 

229, 264 
degree 204 

enumerability of aggregate of 205 
order of approximation to 202-3, 206 
primitive equation satisfied by 265-6 
algorithm 

continued fraction 172-5 
Euclid’s, see Euclid’s algorithm 
almost all 9, 156 
approximation 

closest 208-10,212,216-17 
good 194, 196-7 
order of 202-3 
to quadratic irrational 203 
rapid 198 

to reals by rationals 37 


simple 198, 199 

Dirichlet’s argument 201-2 
simultaneous 200,217-18,227 
area 

of bounded region 540 
of convex region 38 
arithmetic, see fundamental theorem -of 
arithmetic 
associate 83, 113 
in k(i) 233-4, 236 
inJfc(p) 244 

asterisk on Theorem number 16 (f.n.) 
asymptotic equivalence 9 
average order 347, 360 

Bachet’s problem 147-8 
basis 

of integers of k (i7) 268 
of lattice 295 

Bauer’s congruence 126-8,137 
consequences 132-4 
Bemouilli ’s numbers 115, 118 
Bertrand’s postulate 455-7, 497-8 
best possible inequality 529-30 
binomial coefficients 79-8 1 
to prime exponent 80-1 
binomial expansion to prime exponent 
80-1, 110 

biquadrates, representation by sums of 
419-20 

biquadratic field 299-300 
Birch-Swinnerton-Dyer conjecture 
weak form of 592 
Borel-Bemstein theorem 215 
boundary of open region 38 
bounded region 38 


Cantor’s diagonal argument 205 
Cantor’s ternary set 158 
Carmichael number 89, 101 
Catalan’s conjecture 263 
Chinese remainder theorem 
121-2, 137 

class of residues 58-9 
ini(p) 244 
closed region 38 
closed set 155 
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c n (m) [Ramanujan’s sum] 67-8, 77 
evaluation 308-10 
generating function 326-8 
combinatorial argument, even and odd 
partitions 380 

combinatorial proofs 368, 371, 379-80 
common factor 58 

complete quotient, see continued fraction 
complete system of incongruent 
residues 59 

complex multiplication 556 
composite number 2 
long blocks 6 
see also prime number 
computers, uses of 19, 27, 293 
congruence 58 

algebraic, number of roots 123 
to composite modulus 122-3 
to coprime moduli 121 
history 77 
in k(p) 243 
to 1cm of moduli 60 
m od p 2 86,91 

to prime modulus 81, 107, 306 
to prime power modulus 123-4 
properties 60 
system of linear 120 ' 

unique solution 121-2,137 
see also linear congruence 
conjugate, in k(y/m) 268 
conjugate partitions 362 
construction, see Euclidean construction 
continued fraction 52, 165, 197 
algorithm 172-5 

approximation by convergents 175-6, 
194-7, 198 

bounded quotients 212-15 
complete quotient 170,178 
finite 165 

infinite simple 177-8 

irrational 178-80 

periodic 1 84-7 

Ramanujan’s 389-90 

representation of rational number 1 70-2 

simple 168 

and simple approximation 196,199 
and solutions of Pell’s equation 271 
uniqueness of representation of number 
169, 172, 174, 179 
see also convergents to a continued 
fraction 


continuity, arguments from 524 (f. n.) 
continuum, Farey dissection 36-7 
convergents to a continued fraction 166, 
175-6, 180 
consecutive 210-11 
even and odd 169,178 
successive 168, 180-1 
convex region 38-9, 44, 523 
area 39 

equivalence of definitions 38 
symmetrical, contains lattice points 524 
coprime numbers 58 
probability 354 
see also 
cubes 

equal sums of two 257-9, 262 
expression of rational number as sum of 
three 255,261,262 
representation of number by sums of 
420-2 

see also Fermat’s last theorem; g(Jfc); 

G(k)\ Waring’s problem 

cubic form, minimum 547 

cyclotomic field 300, 300 (f.n.) 

* 


decimal 130 
irrational 145-6 
length of period 147-8 
mixed recurring 141-2, 143 
pure recurring 141 
recurring 141 

in scales other than ten 144—5, 149-51 
terminating 140, 142 
uniqueness 140-1 

degree of algebraic number 204, 264 

dense 155,503 

dense in itself 155 

derivative of a set 155 

derived set 155,503 

descent, method of 248, 

251,395, 397 
determinant 

of a lattice 523-4 
of a quadratic form 526 
diagonal argument 205 
digits, missing, see missing digits 
Diophantine equation 549, 550 
ax + by = n 25 
x 2 + y 2 = n 313-14 
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x 2 -2y 2 = l 271 
x 2 -my 2 •■= 1 271 
x 2 +y 2 =z 2 245 
x 3 + >- 3 = 3z 3 253 
jc 3 + .y 3 + z 3 =* 3 257-61 
x 4 +y 4 = z 2 247-8 
x 4 + y 4 = z 4 247 
x 4 +y 4 = u 4 + v 4 260 
xn + yn = zn 245 
^ -y* = 1 263 
equal sums of three 5th or 6th 
powers 444 

equal sums of two Ath powers 442 
Ath power as sum of Ath powers 440 
history 261 

see also Fermat’s last theorem 
Dirichlet’s divisor problem 347, 359 
Dirichlet series 318, 341, 581 
convergence 318 
differentiation 318 
formal theory 329-31 
multiplication 320, 326 
uniqueness 320 

Dirichlet’s pigeonhole principle 201-2, 
227 

Dirichlet’s problem 501 
Dirichlet’s theorem [on primes in an 
arithmetical progression] 16 
divisibility 
inA( v /m) 268 

of polynomials (mod m) 105-6 
tests for 146-7, 164 
divisible 1 
divisor 1 
in A(0 235 
in k(*Jm) 268 
see also d(n ); a k(n); a (n) 
dfc(n) [number of expressions in k 
factors] 334 
generating function 334 
d (n) [number of divisors] 3 1 0 
average order 347-50 
generating function 327 
generating function of [d(n)} 2 336 
normal order 477-8 
order of magnitude 342-6, 359 
in terms of prime factorization 3 1 1 
duplication formula 553, 564 
Durfee square 371 


e 

irrational 46, 55 
transcendental 208, 218-22, 228 
Eisenstein’s theorem [on residues mod p 2 ] 
135, 137 

elliptic curve discrete logarithm problem 
(ECDLP) 590 
elliptic curves 

addition law on 550-6 
congruent numbers 549-50 
and Ferment’s last theorem 586-8 
integer points on 574-8 
L-seriesof 578-82 
modulo p points 573 
points of finite order 559-64 
and modular curves 582-6 
rational points group 564—73 
elliptic functions 372-7, 389-90, 395, 
410-11,416 
Jacobi’s identity 372-7 
elliptic integrals 589 
endomorphism 555-6 
enumerable set 156 
E( Q) 564,565 
equivalence of congruent 
numbers 59 

equivalent numbers 181-4 
Eratosthenes’ sieve 4-5 
see also sieve methods 
Euclidean algorithm 570 
Euclidean construction 17,71,204 
and Fermat primes 71 
of regular pentagon 52 
of regular polygon 71-6 
of regular 17-gon 

geometrical details 76 
proof of possibility 71-6 
see also quadrature of circle 
Euclidean field 274, 275-6 

fundamental theorem of arithmetic 
in 275 

real 276-80,281 
Euclidean number 204 
Euclid number 312 
Euclid’s algorithm 174,231-2 
history 234 

Euclid’s first theorem [on prime divisors of 
a product] 3-4 
source in Euclid 12 
Euclid’s second theorem [existence of 
infinitely many primes] 5, 14 
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Euclid’s second theorem [existence of 
infinitely many primes] ( continued ) 
proofs 14,17,20 
source in Euclid 13 
Euler-Maclaurin sum formula 1 1 5 
Euler’s conjecture [on sums of powers] 
440-2 

Euler’s constant, see y 
Euler’s function, see 4>{m) 

Euler’s identities 366-9, 376, 378 
combinatorial proofs 368-9 
Euler’s theorem [on even/odd partitions] 
378-80 


factorial 

divisibility by 80 
residue of (p-1)! mod p 87 
factors, tables of 12 
factor theorem mod m 105-6 
Farey arc 36 
Farey dissection 36-7 
Farey point 36 
Farey series, see 
Fermat-Euler theorem 78 
Fermat prime, and Euclidean 
construction 72 

Fermat’s conjecture [on primality of F n ] 
7, 18 

Fermat’s last theorem 91, 245, 261-2 
exponent two 245-7 
exponent three 248-53 
exponent four 247-8 
exponent five 300 
Fermat’s numbers, see F n 
Fermat’s theorem [on congruence mod p ] 
78, 108 

converse 89-90 
history 101 
in k(^/5) 288-90 
in k(i) 285-6 
Lagrange’s proof 110-11 
mod /? 2 135—6 
Fibonacci numbers 
prime 192-3 

prime divisors 192-3,290 
Fibonacci series 190-3, 197 
history 197 (f.n.) 
field 

algebraic, see it(fl) 
biquadratic 300 


cyclotomic 300, 300 (f.n.) 

Euclidean, see Euclidean field 
quadratic, see quadratic field 
rational, see A(l) 
simple 274, 276, 301 
[Farey series] 28, 354 
characteristic properties 28-9 
proof by construction of next 
term 31-2 

proof by induction 29-31 
proof using lattices 35 
history 44 

successive terms 28-9 
F n [Fermat’s numbers] 18,100,102 
condition for primality 100-1 
factorization of F 5 18 
probabilistic argument against primality 
18 (f.n.) 

formal product of series 324-5 
four-square representation theorem, see 
representation of integers 
fraction, see continued fraction 
frequency of a digit 159 
fundamental lattice 33, 534 (f.n.) 

linear transformation 33-4 
fundamental theorem of arithmetic 3 - 4 , 
231-4 

analytical expression 321 
in Euclidean field 275 
false in some fields 273-4 
history 12,234,244 
in k(i) 238-41 
in k(p) 243 
proofs 25 

use of, in proofs of irrationality 49 


games, see Nim 

y [Euler’s constant] 47 (f.n.), 347, 461 
problem of irrationality 46 
Gaussian integer, see k(i) 

Gauss’s lemma 92-4 
Gauss’s sum, see S(m , n) 
generalized Weierstrass equation 557 
discriminant 558 

generating function 318,331-7,343 
non-Dirichlet 338-41,362 
geometry of numbers 523 
g(k) [number of Ath powers to represent all 
numbers] 394-5 
existence of g(3) 422-4 
existence of g(4) 419-20, 448 
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existence of g(6) 424—5 
existence of g(8) 425 
lower bound 425-6 
value of g( 2) 409 
value of g(3) 424 
value of g(4) 4 1 9-20, 448 
value of g(6) 425 
value of v(8) 425 
see also v(k) 

G{k) [number of kth powers to 
represent all large enough 
integers] 394—5 
existence of G( 3) 420-2 
lower bounds 426-30 
value of G(2) 409 
Goldbach’s conjecture 23, 594 
golden section 52, 208 


highest common divisor 24, 57, 232 
divisible by every common divisor 25, 
232-4 

formula in terms of prime factors 57 
of Gaussian integers 240 
in non-simple fields 293-4 
relationship with 1cm 57 
right-hand, of quaternions 405-7 
homogeneous linear forms, values at lattice 
points 524—5 

boundary case (Hajos) 545 


ideal 295-9 
principal 295, 297-8 
see also right-ideal; principal right-ideal 
inclusion-exclusion theorem 302-3, 316 
index 89 (f.n.) 

inequality, best possible 529-30 
integer 1,267 
of k(y/m) 265 
of k(p) 241-4 

as sum of powers, see representation of 
integers 

see also algebraic integer; Gaussian 
integer; quadratic integer; rational 
integer 

integral lattice, see lattice 
integral part 93 
integral polynomial 103 
interior point 38 
inverse map 557 
inversion formula 


general 307 
Mobius 305-6 

irrationality of algebraic numbers 229 
irrational number 45 

approximation by rationals 37, 
198-201,203 

continued fraction representation 178-9 
decimal representation 145-6 
e 46, 53-4 

examples known 46-7, 145, 163 
fractional parts of multiples dense in 
interval 501-2 
geometric proof for y/5 52 
logarithms 53 
t r 46,54-5 
n 2 54-5 

rational powers of e 54 
roots of algebraic equations 46, 48 
roots of integers 47-8 
isomorphic elliptic curves 550 

Jacobi’s identity 372-7 
y-invariant of E 550 


*(1) [field of rationals] 230 (f.n.) 

*(V 2) 

primes 287 
unities 270 
*(^2+^3) 299-300 
*(^2 + 0 299 
k(y/5) 

primes 287-8 
unities 288 

fc(exp 2 tt i/5) [cyclotomic field] 300, 301 
k(i) [Gaussian integers] 231,235-41 
fundamental theorem of arithmetic 
in 238-41 
history 244 (f.n.) 
primes 283-4 
unique factorization in 23 1 
k{Jm) 264 
integers of 267-70 
when Euclidean 276-80 
k(p) 231 

and Fermat’s last theorem 249 
fundamental theorem of arithmetic 
in 243 

integers in 241-4 

primes 286-7 

unique factorization in 231 
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k(&) [algebraic field] 264 
Kloosterman’s sum, see S(n, v, n) 
Kronecker’s theorem 501-2, 522 
analytical proof (Bohr) 5 1 7-20 
astronomical illustration 512 
geometrical proof (Lettenmeyer) 503, 
512-14 

inductive proof (Estermann) 

514—17 

equivalence of two forms 5 1 1 
general form 509-10 
homogeneous form 510 
in k dimensions 508-12 
in one dimension 501-5 
proof by ^-chaining 502 
representation on circle 503 
with bound for error 504 


Lagrange’s theorem, see representation of 
integers 

X(/j) [parity of number of prime 
factors] 335 
generating function 335 
A (/i) [log p if n is a power of p] 331-4, 
451 

generating function 332-3 
and /x(/i) 334 
Lambert series 339 
lattice 32-3, 295, 540 
determinant of 523-4 
equivalence 33,35,41 
equivalence in n dimensions 523 
equivalent points 42-3 
fundamental parallelogram 41 
in n dimensions 523 
least common multiple 57 

formula in terms of prime factors 57 
relationship with highest common 
divisor 57 

Legendre’s symbol 85, 101,573 
LeudesdorTs theorem 130-2,137 
Li [logarithm integral] 13 
limit point of set 155,164 
linear congruence 60-2 
division through 61 
existence of solution 62 
number of solutions 62 
uniqueness of solution 62 
linear forms, homogeneous 
values taken 524-5, 527-9 


values taken by product of 526, 

529-30, 532 
at equivalent points 534 
values taken by sum of moduli 525, 529 
values taken by sum of squares 526, 
529-32 

linear forms, non-homogeneous 534 
values taken by product of 534-6, 

537-9 

linear independence 508-9 
of logarithms of primes 509 
Liouville numbers 206-8 
Liouville’s theorem 206-7, 227 
log 9 (fin.) 

slowness of growth 9-10 
logarithmic height 571 
logarithm integral, see Li 
Lucas series 190-3 

Lucas’s test for primality 19, 290-3, 301 
see also M p 

Markoff number 546 
measure of a set 1 56 (fin.) 
measure zero 155,158,205 
see also null set 
Mersenne number, see M p 
Mertens’s theorem 466-9 
method of descent 248, 251, 395, 397 
minimal Weiers trass equation 558 
Minkowski’s theorem 37-8, 39-40 
applications 524-6, 545 
converse 540 
developments 40-3 
generalization 545 
Hajds’s proof 44 

in higher dimensions 43, 523-4, 545 
Minkowski’s proofs 39, 44 
Mordell’s proof 40, 44 
Minkowski’s theorem on 

non-homogeneous forms 534-7 
missing digits 
integers 154-5 
decimals 157-8 
Mobius function, see ix(n) 

Mobius inversion formula 305-6 
analytical interpretation 328-3 1 
modular curve 585-6 
moduli problem 582, 584 
modulus [collection of numbers] 23-5, 27, 
33,231 (fin.), 295 
characterization 24 
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modulus [of congruence] 58, 58 (f.n.), 88 
M p [Mersenne number] 19, 21 (f.n.), 

26, 190 

composite 100 

Lucas’s test for primality 19, 290-3, 
301 

see also perfect number 
multiplication-by-m map 554 
multiplicative function 64, 77, 305 
condition for limit zero 343-5 
multiplicative theory of numbers 338 
fi(n) [Mobius function] 304, 316 
generating function 326 
M( x) [sum of fi(n ) for n up to x] 356 
Mertens’s conjecture 356, 359 
order of magnitude 356, 489-90 

N [is a non-residue of] 84 
neighbourhood of real number 1 55 
Nim 151—4, 164 
losing position 164 
non-negative integer 1 
non-residue, see quadratic non-residue 
norm 

in k(i) 235 
in k(y/m) 268 
in k(p) 241 

normal number 158-64 
examples 164, 164 (f.n.) 
normal order 473 
null set 156,212,216 
number 1 

see also algebraic..; composite..; 
coprime..; integer; irrational..; 
normal..; perfect..; prime,.; rational..; 
round..; squarefree..; transcendental.. 

c o(n ) [number of different prime factors] 
335, 471 

average order 472-3 
generating function of 2°^ 335 
normal order 473-6 
£2(n) [total number of prime 
factors] 471 
average order 472-3 
normal order 473-6 
open region 38 
area 39,42 

order, average 347, 360 
order [of a number, mod m] 88-9 


order of approximation 202-3 
order of magnitude 8 


Pj [prime or product of 2 primes] 594 
parallelograms, tiling of plane by 43 
partial quotient 165 
partition 361-2 
conjugate 362 

graphical representation 361-2 
into an even or odd number of parts 
378, 379-80 
rank 383 

restricted, generating functions 365-6 
self-conjugate 368-9 
unrestricted 361 
see also p(n) 

Pell’s equation 271,281 
perfect number 20, 3 1 1-1 3 
even 312-13 
and Mersenne primes 312 
odd 312 

perfect set 155,158 
period of continued fraction 1 84—5 
<p(m) [Euler’s function] 63-5, 232 
average order 353-4 
generating function 327 
inversion 65, 303 
order of magnitude 352-3,469-71 
and trigonometric sums 65-70 
value 64, 65, 303 

7T 

irrationality 46, 54-5 
irrationality of 7 r 2 54-5 
transcendence [transcendentality] 208, 
223-7, 228 

7 ik ( x ) [number of products up to x of k 
different primes] 491 
asymptotic expansions 499 
asymptotic value 491-4 
7r ( x ) [number of primes up to jc] 7 
asymptotic value 458-60 
formula 593 

and logarithm integral 1 3 
order of magnitude 11,15 
rate of growth 2 1 
values 4-5 

see also prime number theorem 
P(kJ) [Prouhet-Tarry number] 435-7 
values 437-40,449 
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p n [nth prime] 5 

approximate value 12 
formula for 6, 593 
order of magnitude 12,460 
rate of increase 14,17 
size 21 

p(n) [number of partitions] 361 
calculation 378 

congruence properties 380-3,391 
generating function 362-5 
table of values 379, 391 
point at infinity 552 
point-lattice, see lattice 
polygon, constructive regular, see 
Euclidean construction 
polynomial 569-70, 584, 585 
composite values 22, 82, 146, 593-4 
divisibility by a prime power 105-6 
integral 103-4 

linear factorization mod p 1 08 
primitive 265 

polynomial equation, homogeneous 556-7 

positive integer 1 

primality 

tests for related to Fermat’s theorem 
98-100, 102 

Wilson’s theorem as test for 86 
prime factorization 
in 270 

uniqueness, see fundamental theorem of 
arithmetic 

prime factorization theorem 2 
prime factors 

number of, see £2(/i) 
of a product 3 
prime number 2-3 

in arithmetical progressions 15-16, 27, 
145-6 

average distribution 5 
between jc and (1 +s)jc 494 
conjectures 23, 594—5 
distribution, see prime number theorem 
existence of infinitely many, see 
Euclid’s second theorem 
expressible as sum of two squares 284 
first few 3-4 
of the form 3n + 1 287 
of the form An + 1 16, 87-8, 

284, 337 

of the form An + 3 15,112,337 
of the form 5m ±\ 192 


of the form 5m±2 192 
of the form 6n + 1 95 
of the form 6n + 5 16,95 
of the form in ± 1 94 
of the form in ± 3 94 
of the form in + 5 16 
of the form 10/i±l 95, 98 
of the form 10n±3 95, 98 
of the form n 2 + 1 22 
of the form an 2 + bn + c 23 
of the form 2 n + 1 18 
formulae for 1-2, 458 
history 497 
large 5, 19, 26 
recurrence formula 7 
regular 261 

sum of reciprocals 20, 464—6, 497 
tables 4—5, 12 
use of computers 26 
see also composite number; primes 
prime number theorem 7, 1 0-1 1 , 45 1 , 
463-4 

numerical evidence 1 1 
proof 478-89 
prime-pairs 6 

distribution 6, 13,495-7 
existence of infinitely many 6 
primes 

of*(V2) 287 
of k(y/5) 287-8 
of k(i) 233,236-7 
of k(Jm) 268,270,283 
of*(p) 286-7 
problems 23, 594-5 
prime-triplets 6 
distribution 13,499 
existence of infinitely many 6 
primitive equation 265 
primitive polynomial 265 
primitive root 72 (f. n.), 89, 148 
of a prime, number of 89, 306 
of unity 67 

principal right-ideal in k(i) 405-6 
probability arguments 353-4, 496 (f.n.) 
product, see formal product 
of series 

products of k primes see r^(x); jt*(jc) 
Prouhet and Tarry’s problem 
435-7, 449 
pseudo-prime 90, 102 

existence of infinitely many 90 
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\f/(x) [sum function of A] 451 
order of magnitude 451-2 
Pythagoras’ theorem [on irrationality of 
V2] 47 
history 50 

Pythagorean triples 245-7 


qjc(n) [indicator that n has no £th power 
factors] 335-6 
generating function 335-6 
q(n) [indicator that n is squarefree] 335 
generating function 335 
quadratfrei, see squarefree 
quadratic field 264—5,267-8,281-2 
arithmetic in non-simple 293-5 
simple complex 275-6,281 
see also k(y/m) 
quadratic form 526 

determinant invariant under unimodular 
substitution 530 
indefinite 532 
positive definite 526 
prime values 23 

values taken by positive definite form 
526, 530 

quadratic integer 229 
quadratic irrational, order of 
approximation 203 
quadratic non-residue 84 
multiplicative properties 87 
of p 2 126 

properties 87-8, 102 
quadratic number 229, 265 
quadratic reciprocity 95-7 
history 101 

quadratic residue 83, 396 
multiplicative properties 87-8 
the number —3 as 95 
the number 2 as 94-5 
the number 5 as 95, 98 
of/* 2 126 
properties 87-8 

quadratic surd, as periodic continued 
fraction 185-9 
quadrature of circle 223,227 
quaternions 395,416-17 
algebra of 401-3 

highest common right-hand divisor 
405-7 

prime 407-9 


properties of integral 403-5 
quotient, complete, see continued fraction 
quotient of continued fraction 165 
Q(x) [number of squarefree numbers up to 
x] 355-6 


R [is a residue of] 84 

Ramanujan’s continued fraction 389-90 

Ramanujan’s sum, see c n (m) 

rank of algebraic equation 205 

rank of partition 383 

rational integer 1, 229 (f.n.) 

rational number 28 

approximation by rationals 198, 203 
representation by continued fraction 
170-2 

reciprocals, sum of 154-5 
reciprocity, see quadratic reciprocity 
reflected ray problem 505-8 
region 37 
regular prime 261 
remainder 173 (f.n.) 
representation of integers 

by sums of squares 313-14,415-16, 
417; see also squares 
by sums of four squares (Lagrange’s 
theorem) 255,399-415,416 
by sums of two cubes 442-4, 450 
by sums of £th powers 393-4 
see also r(n) 

representative of class of residues 59 
residue 58, 92 
class of 59 
in k(p) 243 
mod/? 2 135-6 
mod a product 63-4 
see also quadratic residue 
Riemann zeta function, see £(s) 
right-ideal in k(i) 405 
r(n) [number of representations as sum of 
2 squares] 313-14 
average order 356-8, 360 
formula 315-16 
generating function 337 
order of magnitude 356-8 
see also representation of integers 
Rogers-Ramanujan identities 383-8, 392 
root of congruence 103 
to prime modulus 106-7 
root of polynomial (mod m) 103 
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root of unity 67-8 
mod p a 124 
round number 476-7 

*to[lHx)-x] 48 1 

Selberg’s theorem 478-81,498 
set theory, see aggregates, theory of 
Siegel’s theorem 574 
sieve methods 4, 594 
o*(/i) [sum of Ath powers of divisors] 310 
generating function 327 
generating function of o a Ob 337 
a (n) [sum of divisors] 3 1 1 
generating function 327 
order of magnitude 350-1,469-71 
simple field 274, 276, 300 
simply normal 159 
singular series 445 
S(m,n ) [Gauss’s sum] 66, 77 
S(p 9 q) [not Gauss’s sum] 95 (f.n.) 
squarefree 20 
integer 264 
number 335, 355-6 
squares 

sum of three 409,417 
sum of two 395-9 
see also representation of integers 
standard form 3 

uniqueness, see fundamental theorem of 
arithmetic 
star region 543 

lattice without points in 543-4 
sum of collection of sets 1 56 
surd, see quadratic surd 
S(u 9 v,w) [Kloosterman’s sum] 68-70, 77 

tables 

of factors 12 
of primes 12 

r *(*) [number of products up to x of k 
primes] 491 

asymptotic expansion 490-4, 499 
Tchebotaref’s theorem 537-9 
Tchebychers theorem 1 1 , 459 
Theodorus’ proofs of irrationality 50- 1 , 55 
theory of numbers 
additive 254,338,361 
multiplicative 338 

# ( x ) [sum of log p for p up to x] 346, 45 1 
order of magnitude 453-5 


t(m) [set of numbers less than and prime 
tom] 126 

trace of Frobenius 591 
transcendental number 203 

aggregate of, not enumerable 205 
construction 206-8 
e 218-22 
examples 208, 227 
7 r 223-7 
powers 228 


uniform distribution 520, 522 
in k dimensions 522 
of multiples of an irrational number 
520-2 

unimodular transformation 34 
unique factorization 23 1 
in quadratic fields 294-5 
see also fundamental theorem of 
arithmetic 
unities 

of k(i) 233,235 
of k(y/2) 270 
of *(V5) 288 
of k(y/m) 268 


vector 502,513 

visible point of lattice 36,535,541 
number of, in bounded 
region 541-3 

v(k) [number of signed £th powers to 
represent all numbers] 431 
bounds for v(5) 435 
existence 431-2 
upper bounds 433-5 
von Staudt’s theorem 115-19 
history 119 
vulgar fraction 28 
F(£) 486 


Waring’s problem 393-5, 
416,444-9 

see also representation of integers; 
squares 

Weierstrass equation 557 
generalized 557 
discriminant 558 
minimal 558 
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Wilson’s theorem 85-6, 

101 , 110 

generalized 132, 137 
history 101, 119 
Lagrange’s proof 110-11 
mod p 2 101, 135-6 
Wolstenholme’s theorem 112-14 
generalizations 130-2, 133, 134 
history 119 


zeta function, see f (s) 
f(s) [Riemann zeta function] 320-1, 341 
and arithmetical functions 326-8 
behaviour as 5 -*■ 1 32 1 -3, 34 1 
Euler’s product 320 
value for 5 = 2 n 320 (f.n.), 341 
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